





VOLUME XXVI NEW SERIES, NO. 173A 


JOURNAL 


OF THE 


AMERICAN STATISTICAL 
ASSOCIATION 


MARCH, 1931 
Supplement 


PAPERS AND PROCEEDINGS 
OF THE 
NINETY-SECOND ANNUAL MEETING 
OF THE 
AMERICAN STATISTICAL ASSOCIATION 


Edited by 
FRANK ALEXANDER ROSS 


Held at Cleveland, Ohio 
December 29-31 
Nineteen Hundred and Thirty 


EDITORIAL OFFICE: Cotumsia University, New York 
PUBLICATION OFFICE: Rumrorp Press, Concorp, N. H. 


Price $2.00 per copy 




















NINETY-SECOND ANNUAL MEETING 


OF THE AMERICAN STATISTICAL ASSOCIATION 
Hep at StaTteR Horet, CLEVELAND, O10 


Monpay, DeceMBER 29, TO WEDNESDAY, DecEMBER 31, 1930 
PROGRAM 


MONDAY, DECEMBER 29 
10:00 A. M. Section I 
Topic: 
Statistics in Specific Industries 
Chairman: 
Bradford B. Smith, Vice-President, American Statistical Association 
Papers: 
Changing Trends in the Building Industry 
William C. Clark, Vice-President, 8S. W. Straus and Company ! 
Cycles in the Automobile Pneumatic Tire Renewal Market in the United States 
Royal E. Davis, Goodyear Tire and Rubber Company 
The Outlook for the Automobile Industry 
John W. Scoville, Chrysler Corporation * 
Distribution Statistics in Coal Market Analysis 
W. H. Young and Fred G. Tryon, Coal Division, United States Bureau of 
Mines 
10:00 A.M. Section II 
Topic: 
Statistical Methodology 
Chairman: 
Henry L. Rietz, University of Iowa 
Papers: 
Principles of Statistical Methodology 
Arthur R. Crathorne, Vice-President, American Statistical Association 
The Concept and Utility of Frequency Distributions . 
Harry C. Carver, University of Michigan 
Discussion led by Burton H. Camp 
Statistical Method from an Engineering Viewpoint 
Walter A. Shewhart, Bell Telephone Laboratories * 
10:00 A.M. Section III 
Round Table Discussion of Problems of Gathering and 
Analyzing Real Estate Vacancy Statistics 
Chairman: 
John R. Riggleman, United States Department of Commerce, Division of Build- 
ing and Housing 
1 No manuscript submitted. 2 Published in Automobile Topics, J y 10, 1931. 


* Revision to appear in a forthcoming issue of the JouRNAL OF THE AMERICAN STATISTICAL AssOCIA- 
TION. 














American Statistical Association 


Papers: 

The Technique of Gathering and Tabulating Vacancy Data 
Bernard J. Newman, Philadelphia Housing Association 

Definition and Classification of Vacancy Data for Purposes of Analysis 
F. L. Carmichael, University of Denver 

Problems in Analyzing Vacancy Statistics 
John D. Bushnell, Eberle Economic Service 

Practical Uses of Vacancy Statistics 
H. Morton Bodfish, United States Building and Loan League 


General Discussion 


10:00 A. M. Section IV 
Round Table Discussion of the Relation of the American 
Statistical Association to International Statistics 
Chairman: 
E. Dana Durand, United States Tariff Commission, Washington, D. C. 
Discussion led by Walter F. Willcox, Cornell University 


2:00 P.M. Section I 
Joint Meeting of the Society, Sections A and K of the 
American Association for the Advancement of Science and 
the American Statistical Association 
Topic: 
Statistical Methodology 
Chairman: 
A. R. Crathorne, Vice-President, American Statistical Association L 





Papers: 

A Simple Theory of Economic Crises 
G. C. Evans, Rice Institute 
Discussion led by Henry Schultz, University of Chicago 

A Method of Decomposing an Empirical Series into its Cyclical and Progressive 

Components 

Ragnar Frisch, Yale University 
Discussion led by Oystein Ore, Yale University * 

Recent Improvements in Statistical Inference 
Harold Hotelling, Leland Stanford University 
Discussion led by Walter A. Shewhart, Bell Telephone Laboratories 


2:30 P.M. Section II 
Joint Meeting with the American Sociological Society 

Topic: ( 
Institutional Statistics 

Chairman: 
Elizabeth C. Tandy, United States Department of Labor 

Paper: 
Welfare and Institutional Statistics in the United States 

Horatio M. Pollock, New York Department of Mental Hygiene 





1 No manuscript submitted. 


re 











Proceedings 


Discussion: 
Neil A. Dayton, Massachusetts Department of Mental Diseases! 
Neva R. Deardorff, The Welfare Council of New York City * 
Bennet Mead, United States Department of Justice 
Paper: 
A Statistico-Legal Study of the Divorce Problem 
Leon C. Marshall, Institute of Law, Johns Hopkins University 


Discussion: 
R. Clyde White, Bureau of Social Research, University of Indiana 
Bennet Mead, United States Department of Justice 


2:30 P.M. Section III 

Topic: 
Wholesale Commodity Price Indices 

Chairman: 
Irving Fisher, Yale University 

Papers: 
Morris Copeland, University of Michigan 
Frederick C. Mills, National Bureau of Economic Research 
Robert W. Burgess, Western Electric Company 
Louis H. Bean, United States Department of Agriculture ! 


8:00 P. M. 
Topic: 
Security Market Analysis and Forecasting 
Chairman: 
Leonard P. Ayres, Vice-President, Cleveland Trust Company 
Papers: 
Tests Applied to an Index of the Price Level for Industrial Stocks 
Edgar Lawrence Smith, Irving Investors Management Company, Inc. 
Technical Action—An Attitude Towards the Market 
James F. Hughes, Otis and Company ! 
Analyzing the Relationship between Stock Prices, Earnings, and Dividends 
Willford I. King, Secretary-Treasurer, American Statistical Association 
A Reappraisal of the Technique of Forecasting Speculative Fluctuations 
Lionel D. Edie, American Capital Corporation ! 
Stock Market Revivals in Relation to Business Movements 
Alexander Sachs, Lehman Corporation ! 


TUESDAY, DECEMBER 30 
9:00 A. M. 


Business Meeting 
10:00 A.M. Section I 
Joint Meeting with the American Sociological Society 
Topic: 
The Observability of Social Phenomena with Respect to Statistical Analysis ? 


1 No manuscript submitted. 
2? To appear later in a monograph published by the Yale University Press. 








4 American Statistical Association 


Chairman: 
Robert H. Coats, Vice-President, American Statistical Association 
Paper: 
The Observability of Social Phenomena with Respect to Statistical Analysis 
Dorothy Swaine Thomas, Yale University 
Discussion: 
F. Stuart Chapin, University of Minnesota 
James W. Woodard, University of Pennsylvania 
Mortimer Adler, University of Chicago 
Edwin B. Wilson, Harvard University 


10:00 A. M. Section II 
Topic: 
The Progress of Financial Statistics 


Chairman: 
Malcolm C. Rorty, President, American Statistical Association 


Papers: 
Progress of Banking Statistics 
Emanuel A. Goldenweiser, Federal Reserve Board 
Progress of Money Market Statistics 
W. Randolph Burgess, Vice-President, American Statistical Association 
Measuring the Stock Market 
J. Edward Meeker, New York Stock Exchange ! 


Discussion led by Robert Warren, Case, Pomeroy and Company ? 


10:00 A. M. Section III 
Topic: 
Statistical Methods and Applications in Biology, Psychology, Education and 
Political Science 


Chairman: 
Edward L. Thorndike, Vice-President, American Statistical Association 
Papers: 
Statistical Methods in Biology 
Sewall Wright, University of Chicago 
Statistical Methods Applied to Psychological Problems 
Truman L. Kelley, Harvard University 
Statistical Methods in Personality Studies: Reliability 
Mark A. May, Yale University 
Statistical Methods in College Administration 
Herbert A. Toops, Ohio State University 
Uniformity in Defining, Recording and Reporting Statistical Items 
Frank M. Phillips, United States Employees’ Compensation Commission 
Statistical Methods in Political Science (The Larger Significance of the Literary 
Digest Prohibition Polls) 
Walter F. Willcox, Cornell University * 


1 Published in pamphlet form by the New York Stock Exchange. 

2 No manuscript submitted. 

* Revision to appear in a forthcoming issue of the JouRNAL OF THE AMERICAN STATISTICAL ASSOCIA- 
TION. 





10:00 A. M. Section IV 
Topic: 
Statistical Methodology 
Chairman: 
T. C. Fry, Bell Telephone Laboratories 
Papers: 
Homogeneity and Stability. L. von Bortkiewicz, University of Berlin ' 
Read by Arthur R. Crathorne, Vice-President, American Statistical Association 
Some Remarks on Applications of Recently Developed Theory for Small Samples 
Henry L. Rietz, University of Iowa ? 
Discussion led by Paul R. Rider, Washington University * 
Use and Misuse of Small Samples 
Arne Fisher * 


12:30 P. M. 
Joint Luncheon Meeting with Social Science Abstracts of 
the Social Science Research Council 
Topte: 
Social Science Abstracts as a Tool of Research * 
Chairman: 
F. Stuart Chapin, University of Minnesota 
Special Guest and Speaker, W. C. Curtis, Division of Biology and Agriculture of 
the National Research Council 
Papers: 
Hornell Hart, Bryn Mawr College 


Niles Carpenter, University of Buffalo 
Susan Kingsbury, Bryn Mawr College 
Royal Meeker, Yale University 
Esther Cole, University of Kentucky 
C. G. Fenwick, Bryn Mawr College 


2:30 P.M. Section I 
Joint Meeting with the American Association for Labor Legislation 
Topic: 
Measurements of Employment and Unemployment * 


Chairman: 
Otto T. Mallery, American Association for Labor Legislation 


Papers: 
The Federal Unemployment Census 
Mary van Kleeck, Russell Sage Foundation 
What We Knew Currently About Unemployment in 1930—Summary 
Bryce M. Stewart, Executive Secretary, Committee on Governmental Labor 
Statistics, American Statistical Association 


1 To be published in a forthcoming issue of The Annals of Mathematical Statistics. 

? Revision to appear in a forthcoming issue of the JouRNAL OF THE AMERICAN STATISTICAL ASSOCIA- 
TION. 

* Elaboration in process of preparation. 

‘ The papers in this section were summarized by F. Stuart Chapin. 

5 Through the withdrawal of the other manuscripts it has been possible to publish Miss van Kleeck’s 
manuscript in full. 





American Statistical Association 


(With Graphic Presentation by Charles E. Baldwin, United States Bureau of 
Labor Statistics; Eugene B. Patton, New York Department of Labor; 
Roswell F. Phelps, Massachusetts Department of Labor and Industries; 
Casimir A. Sienkiewicz, Federal Reserve Bank, Philadelphia) 

Discussion: 
D. D. Lescohier, University of Wisconsin 
Paul H. Douglas, University of Chicago 
Margaret H. Hogg, Russell Sage Foundation 
Charles E. Persons, formerly United States Bureau of the Census 
Robert H. Coats, Dominion Statistician of Canada 
Fred C. Croxton, State Department of Industrial Relations, Ohio 
R. D. Cahn, Chicago Tribune 


2:30 P.M. Section II 
Topic: 
Statistical Applications in the Natural Sciences 
Chairman: 
Edwin B. Wilson, Harvard University 


Papers: 
Statistical Theory of Evolution 
Sewall Wright, University of Chicago 
Place of Statistics and the Field of Probability in Present-Day Physics 
W. F. G. Swann, Bartol Research Foundation, Franklin Institute’ 
The Luminosity of the Stars 
Jan Schilt, Yale University 
Applications of Statistical Method in Engineering 
Walter A. Shewhart, Bell Telephone Laboratories 
Discussion: 
Edwin B. Wilson, Harvard University ? 
2:30 P.M. Section III 


Topic: 
Statistical Methodology 

Chairman: 
Harry C. Carver, University of Michigan 


Papers: 
The Normal Hypothesis 
Burton H. Camp, Wesleyan University 
Discussion led by Harry C. Carver * 
Baye’s Theorem—An Expository Presentation 
Edward C. Molina, American Telephone and Telegraph Company * 
Discussion led by T. C. Fry, Bell Telephone Laboratories * 
Classification of Sizes or Measures by Frequency Functions 
Edward L. Dodd, University of Texas 
Discussion led by Harry C. Carver ? 


1 Revision to appear in a forthcoming issue of the JouRNAL OF THE AMERICAN STATISTICAL ASSOCIA~ 
TION 

? No manuscript submitted. 

* To be published in a forthcoming issue of The Annals of Mathematical Statistics. 





Proceedings 


2:30 P.M. Section IV 
Joint Meeting with the American Farm Economic Association 
Chairman: 
H. C. M. Case, President, American Farm Economic Association 
Papers: 
The New Agricultural Policy of Russia 
Vladimir Timoshenko, University of Michigan * 
The 1930 Census of Agriculture 
F. F. Elliott, Bureau of the Census? 
The Agricultural Situation and Its Effect on Business in 1931 
Louis H. Bean, United States Department of Agriculture 
A Monthly Index Number of Wholesale Prices in the United States for 135 Years 
George F. Warren and Frank A. Pearson, New York State College of 
Agriculture 


WEDNESDAY, DECEMBER 31 
9:00 A. M. 
Business Meeting 
10:00 A. M. Section I 
Joint Meeting with the American Economic Association 
Topic: 
The Business Depression of 1930 # 
Chairman: 
Wesley C. Mitchell, Columbia University 
Papers: 
Carl Snyder, Federal Reserve Bank of New York 
Josef Schumpeter, Harvard University 
Discussion: 
Arthur B. Adams, University of Oklahoma 
Joseph Demmery, University of Washington 
Carter Goodrich, University of Michigan 
Willard L. Thorp, Amherst College 
Leslie Hayford, General Motors Corporation 
Alvin H. Hansen, University of Minnesota 
10:00 A. M. Section II 
Round Table Discussion of Statistical Terms that Require 
More Exact Definition‘ 
Chairman: 
Edwin W. Kopf, Metropolitan Life Insurance Company 
10:00 A. M. Section III 
Topic: 
Statistical Methodology 
Chairman: 
Edwin B. Wilson, Harvard University 
1 To be published in The Journal of Farm Economics, April, 1931, Vol. 13, No. 2. 
? Published in The Journal of Farm Eco ica, J y, 1931, Vol. 13, No. 1. 


+ Published in Supplement to the March issue of The Economic Review, March, 1931. 
‘No manuscript submitted. 








American Statistical Association 


Papers: 
Correlation and Association 
Edwin B. Wilson, Harvard University 
Multiple Correlation for Prediction Purposes 


Dinsmore Alter, University of Kansas 
Exhibition of the Cosmograph. By B. Lewis Padgett 


12:30 P. M. Luncheon 
Topic: 
Review of 1930 and Forecast for 19311 


Chairman: 
R. J. Bulkley, Congressman from Ohio 


Papers: 
Josef Schumpeter, Harvard University 
Lionel D. Edie, American Capital Corporation 
Leonard P. Ayres, Cleveland Trust Company 


2:30 P. M. 
Presidential Address 
Malcolm C. Rorty, President, American Statistical Association ? 


Chairman: 
Leonard P. Ayres, Cleveland Trust Company 
3:00 P. M. Section I 
Topic: 
Analysis and Forecasting of Business Cycles 
Chairman: 
Malcolm C. Rorty, President, American Statistical Association 


Discussion: 
The Characteristic Course of Business Cycles 
Wesley C. Mitchell, Columbia University ! 


The Relations of Credit and Trade 
Carl Snyder, Federal Reserve Bank of New York * 
A Forecasting Index for Business 
Bradford B. Smith, Cleveland Trust Company ‘ 
3:00 P. M. Section II 


Topic: 
Enumeration and Sampling in the Field of the Census—Vital Statistics 


Chairman: 
Robert H. Coats, Vice-President, American Statistical Association 


Paper: 
Enumeration and Sampling in the Field of the Census 
Robert H. Coats, Dominion Statistician, Canada ‘ 


1 No manuscript submitted, 
* Published in the JouRNAL OF THE AMERICAN STATISTICAL AssociaTIon, March, 1931, Vol. 26, p. 1. 
* Published in part as ‘‘New Measures of the Relations of Credit and Trade,”’ in the Proceedings of 


the Academy of Political Science, January, 1930, Vol. 13, No. 4; and as “The Future of Business 
Cycles,” in the Annals of the American Academy of Political and Social Science, May, 1930, Vol. 149. 
4 Revision to appear in a forthcoming issue of the JouRNAL OF THE AMERICAN STATISTICAL Associa- 


TION. 





Proceedings 


Discussion: 
E. Dana Durand, United States Tariff Commission ! 
William F. Ogburn, University of Chicago ! 
Papers: 
Metropolitan Life Insurance Company Statistics 
Louis I. Dublin, Metropolitan Life Insurance Company ! 
Causes of Birth-Rate Fluctuations 
Harold Hotelling, Stanford University ? 


3:00 P. M. Section III 


Round Table Discussion of the Teaching of Social Statistics; A Joint 
Session of the Committee on Social Statistics of The American Sta- 
tistical Association, The Committee on Sociology and Social Work, 
and The Committee on the Teaching of Elementary Sociology of The 
American Sociological Society 
Chairman: 
M. J. Karpf, Training School for Jewish Social Work 
Discussion led by Ralph G. Hurlin, Russell Sage Foundation 


1 No manuscript submitted. 


? Revision to appear in a forthcoming issue of the JouRNAL OF THE AMERICAN STATISTICAL ASSOCIA- 
TION. 





American Statistical Association 


CYCLES IN THE AUTOMOBILE PNEUMATIC TIRE 
RENEWAL MARKET IN THE UNITED STATES 


By Royat E. Davis 


Thirty-eight years ago a new chapter began in the epic history of 
American transportation—the chapter of the motor vehicle. Every- 
one is familiar with the manner in which the automobile has altered the 
entire structure of our national system of transportation. 

The history of the automobile tire parallels that of the automobile. 
American tire manufacturers have played a significant part in the ex- 
pansion and development of the automobile industry. In fact, there 
has been the closest codperation from the beginning. The tire industry 
has added to the comfort and efficiency of automobile transportation in 
the development of the demountable rim, clincher, cord, straight side, 
balloon, pneumatic truck tires, etc. Before the War, a tire would travel 
approximately 3,000 miles. Through improvement in technique and 


research the tire industry has been able to give to the public a tire 
which will travel 15,000 to 20,000 miles on the average. In addition to 
longer mileage, the price of tires has been steadily reduced until now 


tire prices are the lowest in the history of the industry. 

This paper presents the statistical technique and the interpretation 

of an index I have constructed of renewal pneumatic tire sales activity 
in the United States. In the construction of any index one of the first 
problems encountered is that of securing available and adequate sta- 
tistical information. Since the Rubber Division of the Department of 
Commerce was not established until August, 1921, and the Rubber 
Manufacturers Association did not report monthly tire sales until 1921, 
it is impossible to secure monthly rubber tire statistics prior to that 
year. 
The value of all rubber products sold in 1929 amounted to 
$1,055,165,000. Thus the rubber industry has become one of the 
billion dollar industries of this country. Tires and tire sundries con- 
stitute more than 70 per cent of the value of all rubber products sold 
in the United States in 1929. 

Renewal sales, or sales to tire dealers, have increased from 21,100,000 
in 1921 to 49,505,000 in 1928—an increase of 135 per cent. Original 
equipment sales, or tires sold to automobile manufacturers, is a much 
smaller market, and is entirely dependent upon the number of auto- 
mobiles produced yearly. Original equipment sales are expected to 
range from 14,000,000 to 20,000,000 tires per year in the near future. 





Proceedings il 


Shipments to export, while important, represent a small per cent of the 
total tires produced in this country. This market, like the original 
equipment market, has reached a stabilized level of approximately 
3,000,000 tires per year. Due to unfavorable tariff policies, several 
American plants are already established abroad. Extension in these, 
and the establishment of others, will undoubtedly be sufficient to ab- 
sorb the growth in the foreign market. 

I have called your attention to the significance of tires in the total 
value of rubber products sold. Renewal-tire-sales represents not only 
the largest of the three tire markets, but also the fastest growing one. 

For these reasons I have constructed this tire activity index on the 
basis of renewal pneumatic unit tire sales. And since sales have more 
forecasting value than production, this index reflects the tone of the 
market rather than the activity of the factory. 


CHART I 


RENEWAL PNEU. TIRE SALES U.S. 


AVERAGE DAILY— 100% INDUSTRY 


220 
8 200 
r4 
j= 180 
uw 160 
“1 
wn 40 
z= 
120 
a 
3 100 
= 
F 80 
60 
a0 


S3¥iL 40 SONVSNOHL 


a 
° 


1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 ° 


Chart I shows the renewal pneumatic unit tire sales in the United 
States from January, 1921, to date. These figures are based upon the 
monthly reports of the Rubber Manufacturers Association, adjusted by 
a shifting amount to represent 100 per cent of the industry. These 
sales are on an average daily basis, with allowance for Sundays and the 
six legal holidays observed in the tire industry. 

The trend of average daily renewal unit tire sales in the United States 
was rather sharply upward from 1921 to 1925. Although it was in the 





12 American Statistical Association 


same direction between 1925 and 1928, it was upward at a decreasing 
rate of increase. Since 1928 there has been a greater tendency for the 
trend to flatten out. While this means that the trend of renewal tire 
sales has increased at a decreasing rate, it does not mean that the total 
amount of rubber consumed, as well as the dollar value and poundage 
output, has slowed down in proportion. 

All other things being equal, the trend of renewal tire sales is de- 
pendent upon the number of automobiles registered in the United 
States. With the exception of the past two years, it compares favor- 
ably with the trend of motor vehicle registration. The downward 
tendency in tire sales during the last two years in spite of the greater 
automobile registration will be discussed further in another connection. 

The trend of renewal pneumatic tire sales will be considered in greater 
detail because of its bearing upon the final index. Moreover, it is 
affected by several interesting economic facts which need to be touched 
upon in order to interpret some of the basic developments in the 
industry. 

The trend began to show the first sign of slowing down in 1925. 
This was due fundamentally to the use of balloon tires, which came into 
existence in 1923, and have increased rapidly in number. According to 
the Rubber Manufacturers Association, for the first nine months of 
1930, they amounted to 82.8 per cent of the total passenger car renewal 
tire sales. 

Another force has been making for greater average life of automobile 
tires—that of surfaced roads—for the growth of surfaced mileage has 
been rapid, particularly since 1921, and tests clearly indicate that tires 
give greater mileage on hard surfaced roads. On the other hand, this 
factor is partly offset by the increase in annual average driving mileage, 
as evidenced by the increase in gasoline consumed per car. 

In addition to the increase in the average mileage life of balloon tires 
as compared with high pressure tires, and some net gain as a result of 
the growth in surfaced road mileage, tire prices, especially since 1928, 
have indirectly influenced the rate of increase of renewal pneumatic 
tire sales. Since 1926 prices have been down sharply. Current tire 
prices are the lowest in the history of the industry—65 per cent below 
1914—-while all wholesale commodity prices are 25 per cent higher. 

Lower tire prices and marked improvement in the quality of the 
product are largely responsible for another condition making for a 
lower tire consumption per car per year in recent years—namely, a 
shift to better grades of tires. The leading tire companies manufacture 
four to six grades of tires. The lowest grade gives 5,000 to 8,000 miles, 
while the highest gives 25,000 to 50,000. Any decrease in the popular- 





Proceedings 13 


ity of the lower grades in favor of the better grades has a significant rela- 
tion to the renewal sales of two or three years later. Such a shift has 
been noticeable since 1928. 

There has been a marked movement toward six ply tires since 1928. 
The total of all sizes was 24.6 per cent in 1928, 29.2 per cent in 1929 and 
37.6 per cent during the first ten months of 1930. While the prolonged 
depression in general business since the middle of 1929 has made itself 
felt on renewal tire sales to a very large degree, I consider this increase 
in heavy duty tires a more logical explanation of the tendency of the 
trend line to flatten out since 1928. From 1925 to 1928 it was slowed 
down by the rise of the balloon tire and the corresponding decline of 
the high pressure. This influence has continued to date, but the com- 
ing of the heavy duty tire, which gives several thousand additional 
miles, is the more recent factor responsible for the slowing down in the 
trend line of renewal tire sales. 

Another factor which has played no small part in the improvement of 
tire quality since 1921 is the reduction in the number of active tire 
companies in the United States. The peak was reached in 1922, when 
there were 166 active tire companies and 70 or 80 which were inactive. 
Due to fluctuations in crude rubber prices, the steady decline in tire 
prices and the competitive situation, the mortality of tire companies 
has been extremely high since 1922. For the most part those which 
have passed out were high cost producers. Approximately 370 tire 
companies have been in existence at some time in the history of the tire 
industry. Most of these have failed, merged or are no longer manu- 
facturing tires. According to the best available information 28 tire 
companies are now in operation under their own management. 

The number of different types and sizes of both high pressure and 
balloon tires has increased considerably since 1921, so that some 
companies manufacturing a full line have several hundred types and 
sizes. This overhead in mold and inventory expense, as well as the 
constant and radical changes in types and sizes, has made tire manu- 
facturing increasingly unprofitable for companies poorly managed and 
financed. Thus the tire market has been reapportioned among the 
stronger companies, and the consumer has benefited by the improved 
quality of tires and by the low retail prices which have prevailed as 
result of the keen competition in the industry. 

Due to the displacing of square-woven fabric by cord fabric, con- 
sumption dropped sharply from 5.4 tires per car in 1917 to 2.3 in 1920. 
The trend has been less sharply downward since then. Approximately 
1.4 tires per car will be sold in 1930, which is the lowest per car figure on 
record. 














14 American Statistical Association 





The next factor has to do with seasonal variation. After testing the 
generally accepted methods for computing seasonal variation, I was 
not satisfied that any of these indexes reflected the true picture. 
Carrying these researches further, I found that the progressive seasonal 
method proved very satisfactory. In determining the seasonal index 
by the progressive method, link relatives were computed for the period 
January, 1921, to date and a trend line was fitted to them for each 
month from 1921 to 1930. 

There are two reasons underlying the increased importance of the 
month of January: First, due to the extensions of surfaced roads, the 
closed car and the application of the motor to business transportation, 
there has been a tendency toward year-around driving; second, and 
more significant, tire manufacturers have initiated spring dating, which 
is the policy of soliciting orders from tire dealers, beginning about 
November Ist. Spring dating has many advantages both to the dealer 
and to the tire manufacturer: it enables the manufacturer to level out 
the fluctuations in production and employment; the dealer, not being 
required to settle his account until spring, is protected against price 
declines and has a complete stock of all types and sizes during the 
winter months; and, with the trend toward year-around driving, an 
adequate stock of tires on the dealers’ shelves has the psychological 
value of keeping him actively in the tire business in the winter as well 
as in the summer. 

It is the general practice in the industry to ship the major part of the 
tires ordered on spring dating terms after January 1. This provision in 
the policy accounts for a large part of the increased importance of 
January. Since many tires are shipped to dealers in January on spring 
dating terms to be sold in February and March, there has been a cor- 
responding decline in the importance of these months. April was of 
increasing importance until 1926, and has remained at approximately 
the same level since then. This is due to the fact that the spring dating 
program did not become noticeably effective until after 1926. May 
has become slightly less important, probably because dealers are re- 
quired to settle their spring dating accounts about May 15th. June 
and July, the first two months of the heavy retail buying season, have 
become of increasing importance. August has decreased slightly in 
importance, while September has increased. October has become less 
important, due to the fact that dealers let their stock get lower in order 
to make settlements for their heavy purchases during the four preced- 
ing months. November shows very little change. December is 
erratic, although there is some indication that it is becoming slightly 
more important. 














he 
‘as 
re. 
al 
ex 
od 
ch 


he 
he 


nd 
ch 
ut 
ler 
ut 
ng 
ice 
he 
an 
al 
ell 


ng 


288 
ler 
»d- 


18 


» 


tly 








RS 
+ 
: 

t 
a 


% 
8 


oe 
etsy 


5 
j 


4 
2. 


Proceedings 15 


In computing the progressive seasonal variation, the ordinates of 
trend for the twelve months were adjusted to 100 per cent for the year. 

In analyzing these progressive seasonals, a number of things stand 
out: the seasonal spread from the low month to the high month has 
become considerably greater from 1921 to date. In the early period 
August was the peak month; however, during this ten-year period 
there has been a tendency for July to increase in importance, so that 
after 1929, July is slightly more important than August. In 1921 
January was only slightly more important than February; now it is 
considerably more important. March is not so important in relation to 
January and February as it was in the earlier part of the period under 
consideration. June, July and August are of relatively greater im- 
portance than they were ten years ago. Due to the extension of the 
fall driving season, September has increased in importance. The 
seasonal change from September to October has become greater, for 
reasons already noted. November and December show very little 
change. ; 

Taking the basic statistics of average daily renewal pneumatic tire 
sales of the United States and correcting for trend and progressive 
seasonal variation and running a three months’ moving average to 
smooth out the random fluctuations, I obtained an index of the activity 

_ of tire sales with respect to normal. 

Chart II shows the index in its final form. The first cycle ran from 
January, 1921, to July, 1923; the second from July, 1923, to December, 
1925; and the third from December, 1925, to the present. Thus tire 
sales activity has passed through three cycles in the last ten years; the 
first two ended in recessions and the last is ending in a severe depression. 

A comparison of the Tire Sales Activity Index with the Annalist 
Index of Business Activity, as shown in the second frame of the chart, 
shows that tire sales fluctuate further above and below normal than 
does general business activity. General business activity has also 
made three cycles in the last ten years. The first terminated in the 
summer of 1924, the second in the closing months of 1927 and the third 
probably terminated in the closing months of 1930. While the first 
two cycles in general business activity concluded with mild recessions 
and the third cycle is closing with our present depression, the termina- 
tion of the first two cycles did not appear at the same time for renewal 
pneumatic tire sales activity and general business activity. Both, 
however, close the recent cycle with a depression. 

Comparing these two indexes further, it will be noted that tire 
activity resumes about two to six months ahead of general business 
activity in the recovery stage of the cycle, and also supersedes general 








16 American Statistical Association 


business activity by about two to six months in the declining period of 
the cycle. 

A more detailed analysis of the cycles in renewal pneumatic tire 
sales activity is essential to an appreciation of the index. The first 
cycle from January, 1921, to July, 1923, had a definite forerunner in the 
sharp fall of crude rubber prices in 1920. With crude rubber fluctuating { 
at low levels between 111% and 20 cents during 1921 and up to October, 
1922, tire prices were reduced four times with a total decline of 43 
per cent. These lowered prices. brought out the deferred buying, and 
renewal tire sales were further stimulated by the hard times in 1921, 
which kept old cars in operation for a longer period. 

The year 1922 was marked by extraordinary activity in which all 


CHART II 


—— PNEU.TIRE SALES ACT pi 








+40 +40 


































































































192] 1922 1923 192% 1925 1926 1927 1928 1929 1330 1931 1932 





Proceedings 17 
CHART II (Continued) 


GENERAL BUSINESS ACTIVITY 


NNALIST 
or — ai 





#10 +10 


pon 


























20 









































30 





1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 193) 1932 


branches of the rubber industry participated. British crude rubber 
restriction was effective on November_1, 1922, but advance rumors had 
started the price of crude rubber upward as early as September, and by 
January, 1923, it had reached 37 cents. Dealers, anticipating higher | 
tire prices, loaded up as soon as rubber prices began to rise. This ad- 
vance in crude rubber, and two increases in tire prices (January and 
March, 1923), amounting to 27 per cent in total, brought the boom 
period of the cycle to an end. 

The sharp decline in tire activity during the first half of 1923 forced 
rubber prices downward from 37 cents in January to 1844 cents in 
June, 1924. Continuous declines in crude rubber resulted in the reduc-| 
tion of tire prices four times from May, 1923, to October, 1924. Tire 
activity reached normal in the first quarter of 1924. However, the 
recession in general business during the summer had been preceded a 
few months by some falling off in tire activity. 

Tire sales were extremely active during the first half of 1925, the 
boom period of the cycle. Crude rubber fell from $1.23 in July to 64 
cents in August, 1925, but recovered to close the year at $.88-$1.10 in 
December. These abnormal prices forced tire prices up five times 
during the summer with a total advance of 68 per cent—which, to- 
gether with discontinuance of spring dating terms by tire manufactur- 
ers, caused the cycle to close at 30 per cent below normal. 











18 American Statistical Association 





Production of finished rubber goods during 1926, as a whole, was at a 
satisfactory level. Tire prices fell 40 per cent during the year. This 
started recovery from the low level of 30 per cent below normal, and 
normal was reached by August. 

With a steady crude rubber market ranging in price from 37 to 40 
cents during 1927, tire prices were slightly reduced as tire manufactur- 
ers began to buy at the lower levels. Tire activity declined to normal 
in the third quarter, in advance of the recession in general business 
activity in the fourth quarter. These tire price cuts stimulated buying 
and the tire industry was again on the up-grade during the closing 
months of 1927. 

At the beginning of 1928 crude rubber prices were firm, around 42 
cents per pound. Announcements preceding the removal of the 
British restriction caused successive declines during February, March 
and April to a low of 1634 cents. This condition brought two major 
reductions in tire prices which, with active general business, maintained 
the tire sales at high levels. The record year for renewal tire sales was 
1928, with a total of 49,505,000 tires sold. 

Tire sales activity declined below normal in June, 1929, some months 
in advance of general business. Early in the summer of 1929 it became 
evident that tire sales activity was in for a major depression. Dealers’ 
and manufacturers’ stocks were extraordinarily heavy, largely due to 
earlier anticipation of a very active year. During the last half of 1929 
tire sales activity declined sharply and closed the year at 20 per cent 
below normal. 

Activity in tire sales showed very mild improvement during the first 
quarter of 1930. However, since that time, the trend has been sharply 
downward, until November. 

The depression phase of this last cycle is very similar for general 
business and renewal pneumatic tire sales. However, the tire industry 
has been depressed longer and at lower levels. 

In conclusion, a study of automobile tire and crude rubber prices 
reveals the fact that the price of tires is largely dependent upon the 
price of crude rubber. In the period 1917-1920, however, tire prices 
were high while crude rubber prices were comparatively low, high labor 
costs taking the toll of price. In 1922 and 1923, years of readjustment 
in the tire industry, economies in manufacturing were reflected in lower 
tire prices despite higher prices of the raw material. Higher crude 
prices in 1925 were only partially met by tire price increases and as a 
general rule, during the period of 1925 to date, crude and tire prices 
have moved together. The recessions (1923 and 1925) in the tire busi- 
ness have been due to causes within the industry, while the depressions 

















ve) 


- oO dN 











Proceedings 19 


(1921 and 1930) have been due to a combination of causes within the 
industry and have been further aggravated by influencing factors in the 
falling off of general business. 

Tire activity has moved two to six months ahead of general business 
in the past, both on the down swing and the up swing. Therefore, 
based on the outlook for the probable upturn in general business, as well 
as current dealers’ stock situation, low tire price levels, condition of 
tires on cars now in operation, etc., I would conclude that renewal tire 
sales reached the bottom in October, 1930, and that some upturn is in 
order. 

Viewing 1931, on the whole I expect renewal tire sales activity to 
improve steadily throughout the year. While this improvement is not 
likely to be so swift as the recovery following depression periods in the 
past, it seems reasonable that 1931 should be a more active year than 
1930, but less active than 1929. 








20 American Statistical Association 


DISTRIBUTION STATISTICS IN COAL MARKET ANALYSIS! 


By W. H. Youna anp F. G. Tryon 


It is a commonplace that less is known about distribution of goods 
than about any of the primary economic functions. This is especially 
true of the bituminous coal industry. We have excellent records of 
production, for, as with most commodities, production is the easiest 
of the market factors to measure. We are accumulating more and 
better indicators of current consumption and have developed methods 
for measuring the elusive but critically important factor of consumers’ 
stocks. Of the movement from mine to place of use, however, we know 
little. 

This situation—which is common to most American industries, 
though rendered more serious in coal by the prolonged depression of 
the world fuel markets—has created a widespread demand for statis- 
tics of distribution. Part of the information wanted will be supplied 
by the new Census of Distribution. The Census will measure for the 
first time the size of the distributive function, as indicated by persons 
employed and volume of sales, and it will show the types of wholesale 
and retail agencies by which marketing is effected. But in addition 
there is a very active demand for data on the physical movement of 
commodities from point of production to point of use, or what the 
marketing specialists call “origin and destination of shipments.’’ The 
coal sales manager wants to know how much coal is moving out of his 
own district into each one of his potential markets. He needs such 
information in planning where to expend his sales effort, in establishing 
quotas for his branch offices and traveling salesmen, and in weighing 
the possibilities of new markets. The buyer of Boston Edison, on the 
other hand, wants to know the volume of the coal flowing into New 
England and the sources from which it comes. While the primary 
demand comes from those directly engaged in buying and selling, 
statistics of this kind are also needed in determining long-time policy 
and planning investment. They are indispensable in adjusting freight 
rates, and they throw a flood of light on the competitive relations of the 
coal fields and on the financial prospects of both the coal companies 
and the coal-carrying railroads. 

To meet this need, the Brookings Institution and the Bureau of 
Mines recently undertook a coéperative study, designed to explore the 


1 Published by permission of the Director, United States Bureau of Mines, and the President, 
Brookings Institution. (Not subject to copyright.) 





Proceedings 21 


problem and develop methods of measuring the origin and destination 
of shipments of coal and coke. The study is nearing completion, and 
the results are so encouraging that the Bureau of Mines has now pro- 
vided funds to establish the work on a permanent basis. It is believed 
that the methods developed can be applied to the measurement of the 
physical distribution of many other bulky commodities. 

There are three possible sources of information from which data on 
physical distribution may be obtained: (1) the producer, or his sales 
representative, (2) the consumer, and (3) the carrier. All three sources 
were tried in this study. 

In an industry consisting of a small number of large producing units 
the best source of information is often the producer. The 75 com- 
panies making by-product coke were able to give both the destination 
of their coke and the uses to which it was put.1. The producers of 
Pennsylvania anthracite have at times compiled elaborate statistics of 
distribution through a trade organization known as the Anthracite 
Bureau of Information. 

In an industry comprising many small units, such as bituminous coal 
mining, the collection of distribution data from the producer is difficult 
or impossible. In a few districts local associations of coal operators or 
private statistical bureaus have kept records of distribution which have 
been very helpful to their members, but they have seldom covered all 
the tonnage of the district, and they embrace at present hardly 10 per 
cent of the national output. The reasons for this condition are not far 
toseek. There are more than 6,000 commercial mines, not to mention 
thousands of country coal banks. Few of the smaller mines keep good 
accounting records. Many of them sell through wholesalers and do 
not know where their coal moves. Even the larger companies often 
sell part of their output through wholesalers, but part also goes direct 
to the consumer. Neither the operators alone nor the wholesalers 
alone, therefore, can give a complete picture of distribution, and when 
one attempts to supplement incomplete reports from the operators 
by resort to the wholesalers, he finds serious double counting where the 
same tonnage has been reported once by an operator and a second or 
even a third time by one or more wholesalers. Lesher, in the pioneer 
study of coal distribution, did collect reports from individual operators 
and wholesalers but later abandoned this source for more accurate 
methods.? The authors also have experimented with a direct canvass 
of operators in two selected districts, but the results were not satisfying. 


1 The results of this inquiry are being published by the U. S. Bureau of Mines in Coke and By-Products 
in 1929, Part B, Distribution. See also U. 8. Geological Survey, Coke in 1915, by C. E. Lesher. Similar 
data on the destination of shipments of Portland cement are collected by the Bureau of Mines. 

2C. E. Lesher, Coal in 1915, Part B, U. 8. Geological Survey, pp. 434-435. 














22 American Statistical Association 





The second possible source of information is the consumer. Certain 
classes of large consumers, with whom coal is a major item of expense 
and who often employ a special fuel agent, can report accurately the 
sources of their purchases. The electric utilities, for example, were 
able to give the originating railroad and shipping point of practically 
all coal received, permitting exact identification of the field of origin.’ 
Similar reports were readily furnished by the railroads and by-product 
coke plants. But for the mass of consumers this method is out of the 
question. There are 90,000 buyers of coal in carload lots, to say noth- 
ing of the hundreds of thousands who make up the ‘‘small steam trade” 
and are served by truck from retail yards. They know how much they 
consume, but most of them have no accurate record of its source. 

To obtain a complete picture of distribution, therefore, it is necessary 
to go to the third source of information—the records of the coal-carry- 
ing roads. This was the principal source used by Lesher in studies 
which laid the basis of the Fuel Administration’s control of distribution 
during the War,’ and it has yielded by far the best results in the present 
inquiry. 

Somewhere in the offices of the 177 railroads that load bituminous 
coal is a record of the origin and destination of every car of coal shipped 
during the year. If it were necessary to go back to the ten million 
waybills, the labor involved would be formidable, but, in reality, it is 
surprising how much of the task is already being done. In certain 
areas the carriers have established statistical bureaus for recording the 
ebb and flow of coal traffic. The largest of these is the Ohio Bureau of 
Coal Statistics, managed by an honored member of the American 
Statistical Association, which receives a copy of the waybill on every 
westbound car of coal from the northern and middle Appalachians. 
Its reports are very properly not published, because they are prepared 
in great detail and might reveal individual operations, but they contain 
all that is needed for public purposes. Similar railroad organizations 
cover shipments from mines in Illinois, Indiana, and western Kentucky 
and the large tonnages handled over Great Lakes and tidewater piers.* 
In all, about half the production of the country is covered in this way. 

Outside the areas illuminated by these special bureaus lie vast 


1W. H. Young, Sources of Coal and Types of Stokers and Burners used by Electric Public Utility Power 
Plants, The Brookings Institution, Pamphlet Series, Vol. 2, No. 2. See also Lesher, Coal in 1917, 
Part B, Distribution and Consumption, U. S. Geological Survey; and Tryon and Bennit, Coke in 1928, 
U.S. Bureau of Mines. 

2C. E. Lesher, Fuel Administration, Report of the Distribution Division, Part I, The Distribution of Coal 
and Coke; Part III, Statistical Tables; see also U. S. Geological Survey, Coal in 1917, Part B, and Coal 
in 1918, Part B, by the same author. 

3 The Illinois Freight Association, of Chicago; the Ore and Coal Exchange, of Cleveland; and the 
Tidewater Bituminous Coal Statistical Bureau, of Philadelphia. 





eT, 
(CRAG GUB Rad essa Distt oo Lt aM 


ry 














=~ SZ ms we Fee | 











Proceedings 23 


stretches of darkness. Yet even here the inquiry has shown that the 
traffic managers of the individual railroads usually keep some current 
record of the coal movement on their lines, and all of the carriers thus 
far approached have been able to furnish a statement which could be 
worked into a composite picture of distribution for the region as a 
whole. Occasionally the carrier has had to work up the entire state- 
ment from its waybills; more often it has had to consult the waybills 
on a part of the tonnage only, and that in order to fit its own records 
into a framework applicable to other carriers as well. 

A number of technical problems have arisen in handling the material, 
three of which deserve brief mention: (a) Coal dumped over piers into 
vessels for transhipment by the Great Lakes loses its identity and in 
general can be followed on its subsequent journey only as “‘ Lake Cargo 
Coal.”’ (b) A considerable tonnage of coal is reconsigned while in 
transit to someone other than the original consignee, and it is often 
impracticable to trace reconsignments through, so that most of the 
record represents the original billing. The inquiry shows, however, 
that most of the reconsignments go to destinations within the same 
consuming area as the original destination, and hence the error in the 
final result is believed to be small. (c) The most serious problem has 
centered around railroad fuel coal, which must be kept separate from 
other shipments because its place of consumption is indeterminate. 
In general, it has been identified and eliminated from the revenue ship- 
ments, as reported by the originating carriers, by means of other re- 
ports on sources of coal purchased which the authors have collected 
from the fuel agents of the consuming carriers. The reconciliation of 
these two sets of reports has occasionally required arbitrary adjustment, 
but the errors thereby introduced are not believed to be great. A 
partial check on the results is given by data on consumption within the 
destination area, obtained from the Census of Manufactures and other 
sources. 

Two examples will illustrate the type of information obtained. 
Table I shows the movement out of the little Arkansas field. Arkansas 
produces coals of good quality, some of which are peculiarly adapted 
to domestic use, and ships widely over the Mississippi Valley. The 
field has been cruelly pressed by the competition of oil and gas, and 
since 1917 its shipments to destinations in the Southwest and its sales 
of railroad fuel have sharply declined. On the other hand, it has 
largely expanded its trade to the northward, where Arkansas semi- 
anthracite has grown in favor, and the increasing shipments to Missouri, 
Kansas, Nebraska, Iowa and Minnesota throw a flood of light on the 
shrinking market for Pennsylvania anthracite in those states. 





24 American Statistical Association 


TABLE I 
DISTRIBUTION OF COAL PRODUCED IN THE ARKANSAS FIELD, 1917 AND 1929 * 








Thousand net tons 





Destination 





Used in Arkansas 
Sold to local trade, not ship; 59 
EET z 

P points 
Shipped to other states 


V1 
RS 


bt 


= 
_NS 
@ 


20 
9 
117 
31 
6 
250 
50 
‘89 
36 
127 
737 
2,143 


Ll lttottet ++ 
Re SSOU Of Visas 








| 
o9 
a 

















* Corresponding data are being obtained for other producing fields. Figures for 1929 subject to 
revision. 
(a) Less than 500 tons. 


Table II illustrates the reverse of the picture by showing (in much 
abbreviated form) the sources of all the coal received in Iowa. lowa 
is one of the many competitive battlegrounds of the coal market. It 
has mines of its own, handicapped by inferior quality and difficult 


TABLE II 
SOURCES OF SOF’ COAL CONSUMED IN IOWA * 








Thousand net tons 





1929 





2,623 
3,282 
26 


84 


944 
1,052 
165 490 


151 
61 
118 
40 

9 
9,447 8,880 


11 
393 


+14+4++ 44+ 




















* Corresponding data are being obtained for other States. Figures for 1929 subject to revision. 





Proceedings 25 


mining conditions, and it receives coal from 24 other sources lying at all 
points of the compass. The changes since 1917 are typical of the 
competitive shifts in the coal industry since the War. Receipts from 
Iowa mines and from the unionized areas of the Middle West have 
fallen sharply. On the other hand, receipts from non-union western 
Kentucky have grown apace, as have those from the Appalachian 
fields and from Arkansas and Oklahoma. Coincident with this in- 
crease from the East and South, receipts from the Rocky Mountain 
fields have declined abruptly. 

The examples given will illustrate the material which is being ob- 
tained for all other producing fields in the country and for each con- 
suming state or region. It is of great interest to all those engaged in 
the fuel and railroad industries. We call attention to it because the 
methods used seem applicable to the study of other bulky commodities. 
Railroad traffic records offer a tool of great promise in market research. 
McFall has utilized them in analyzing the movement of commodities 
in and out of New England and of the Pacific Southwest.! Since 
1918 they have been used by the Department of Agriculture in com- 
piling data on the physical movement of fruits and vegetables.? It is 
probable they would be helpful in tracing the movement of many other 
crude and partly finished products which are sold in carload lots. 
What can be done to improve and strengthen our present statistics of 
railroad commodity traffic? 

If coal only is considered, the path of progress is to encourage the 
railroads to develop along the lines they have already marked out. A 
movement has been started by some of the trunk-line carriers to estab- 
lish a coal statistical bureau covering eastbound shipments from the 
Appalachians. This would clear up the most serious blind spot in the 
present records and is plainly in the public interest. Much can be 
done also by encouraging individual traffic managers to keep their 
operating records in a form that would permit more ready combination 
with those of other roads. 

But if other commodities are to be included, it might be well to con- 
sider a national system of reporting the movement of bulk traffic be- 
tween the several regions of the country. The ultimate ideal, from 
the point of view of the market analyst, is represented by the German 
traffic statistics. The Reich is divided into 46 traffic regions, such as 
Berlin and environs, the Ruhr, East Prussia, etc., and the record shows 


1 Robert J. McFall, The External Trade of New England, and Transcontinental and Intercoastal Trade 
of the Pacific Southwest in 1926, U. S. Bureau of Foreign and Domestic Commerce. 

? Division of Fruits and Vegetables, Bureau of Agricultural Economics, Carlot Shipments and Un- 
loads of Important Fruits and Vegetables. 

* Die Giiterbewegung auf Deutschen Eisenbahnen. 





26 American Statistical Association 


the movement from each of the 46 regions into every other region. 
This is given not only for coal but for 104 other bulk commodities, in- 
cluding products of the mines, of forests, of agriculture, and of the 
heavy manufacturing industries. The whole geography of German 
heavy industry is laid bare as if with a scalpel. 

How far it is expedient to move in the direction of the German system 
under the very different conditions prevailing on American railroads is, 
of course, a matter for the Interstate Commerce Commission to decide. 
Dr. Lorenz, the Director of the Bureau of Statistics, has already done 
much to improve the Commission’s commodity statistics. Starting 
with January, 1928, the Commission increased the number of com- 
modity classes distinguished in its traffic records from 70 to 157, and 
began to show the tonnage terminated by each carrier as well as that 
originated.! This information is collected quarterly and is published 
with commendable promptness: it deserves careful study by all inter- 
ested in marketing. Its usefulness would be greatly increased, how- 
ever, if the tonnage terminated on the lines of a given system, such as 
the Baltimore and Ohio, could be subdivided by states or other appro- 
priate regions. As it is, we know only that the Baltimore and Ohio 
unloaded so many tons of freight somewhere between the Mississippi 
and the Atlantic Seaboard. The present reports, also, leave unan- 
swered the question of where the tonnage unloaded may have come 
from, except to show whether it originated on the carrier’s own line 
or was received from connections. 

Any expansion of the commodity statistics must naturally be 
settled by the Commission in the light of its primary responsibility in 
regulating rates and services, and with due regard to the views of the 
carriers. The outsider can do no more than indicate his interest in 
the development of records which promise to be a major tool in market- 
ing research. 


1 Interstate Commerce Commission, Bureau of Statistics, Freight Commodity Statistics, Class I, Steam 
Railways. The annual report gives data for each railway, and the quarterly summary, the total for each 
of eight regions. 





Proceedings 


PRINCIPLES OF STATISTICAL METHODOLOGY 


By Artuur R. CraTHORNE 


The title of this paper contains a word against which legitimate 
criticism can be raised, in that it does not express the main interest of 
the theoretical statistician. The word is “methodology.’”’ We often 
hear it used in connection with statistics, and we often hear statistics 
spoken of asa method. A recent German writer in discussing statis- 
tics said that in Germany it was considered as a science, but in some 
other countries asa method. The word, methodology, seems to carry 
with it a notion of routine, of established rules of procedure, of a thing 
that is finished, but does not describe that part of statistics which we 
may call theoretical or mathematical, or a science of statistics. At the 
present time no scientific subject is more alive and growing. A treatise 
on mathematical statistics written now might be out of date in some 
parts before the author could get it published. Of course we need and 
have method in statistics. We could not get along without it, but let 
us think of method as a part of the whole and not think of theoretical 
statistics and statistical methodology as synonymous terms. We have 
method in chemistry, but few people confuse chemistry with chemical 
methodology. 

While I am still on the subject of the title, I may mention the word 
“principles.”” When I first began to write this paper I found I was 
writing something which should be entitled “A list of the important 
things of a mathematical nature used in statistics.”” Nothing could be 
much easier to do than this. The list is long and paragraphing would 
be easy, one paragraph each on averages, dispersions, frequency func- 
tions, random sampling, interpolation, correlation, periodogram analy- 
sis and so on. But this would make theoretical statistics merely a 
collection of mathematical tools. The difference between this and 
mathematical statistics is somewhat like the difference between a 
course in college algebra and a course in calculus. College algebra is a 
collection of topics to which custom has given a name. In calculus we 
have one central idea to which everything in the course is related. 

There was a time not so very long ago when the search for a theoreti- 
cal foundation for statistics was considered a phantasy. The efforts of 
Quetelet to bring some order out of the chaos had broken down and his 
detractors seemed to be satisfied with an easy victory. Statistics was 
in danger of being broken up and absorbed in other subjects. We had 
what I may call the nebular hypothesis, which conceives of a vast 





28 American Statistical Association 


chaos of seething statistics rapidly revolving and now and then throw- 
ing off a large mass which condenses into a rather well defined planet. 
This phenomenon has occurred twice, resulting in our theory of errors 
and in actuarial science, and the words biometry and sociography in our 
vocabulary indicate that the process is still going on. It then becomes 
the object of mathematical statistics to produce a group of Newtons 
who, lying under the statistical apple tree and seeing a random sample 
of apples fall, will develop a body of laws which explain the unity and 
stability of the new solar system. 

During the years succeeding Quetelet, notwithstanding the work of 
Laplace, Poisson, and, later, Tchebycheff and Lexis, little advance was 
made in a theoretical way by men primarily interested in statistics. 
We had rather vague theories embodied in inductive philosophy, in 
which statistics was assigned the task of taking care of those loose 
causal relations which arise in connection with plurality of causes and 
effects. But statisticians were busy with other phases of their prob- 
lems and as in other sciences the examination of the foundations could 
wait. The movement which developed into the science which we call 
mathematical statistics is less than half a century old. There are 
probably some in this room who could tell us of interesting personal 
contacts with most of the men whose names stand out in this develop- 
ment: Gram, Thiele, Lexis, Edgeworth, Pearson, Charlier, Fisher. 
Interest in the subject in this country is probably less than twenty-five 
years old. In 1914, the Association celebrated its seventy-fifth an- 
niversary by publishing a History of Statistics, but interest in the 
theoretical side was so slight at that time that it has practically no 
mention. And yet we have heard the word “‘arrogant’’ applied to the 
mathematical statistician, that he strutted about in his little domain 
as if he ruled the whole empire. That is just a little unkind. No one 
is more aware than he of the limits of his subject. Part of his interest 
is knowing when not to use mathematics. He is not the one who likes 
to see a page speckled with summation and integral signs because it 
looks scientific. No amount of mathematical training and ability can 
take the place of the judgment and common sense that comes from a 
knowledge of the field in which the problem lies. 

When one wishes to emphasize the fact that a branch of knowledge is 
well organized or wants to summarize it in a few words, it has become 
rather customary to speak of its problem. Thus we have the problem 
of theoretical statistics. Many men have stated it. In its most con- 
cise form the problem of this part of statistics is usually stated to be the 
reduction of large masses of data to a few quantities, each of which is 
often called a statistical constant, or, more recently, a statistic. For 





Proceedings 29 


example, R. A. Fisher thus states the problem: “‘A quantity of data 
which usually by its mere bulk is incapable of entering the mind, is to 
be replaced by relatively few quantities which shall adequately repre- 
sent the whole, or which in other words shall contain as much as pos- 
sible, ideally the whole, of the relevant information contained in the 
original data.” Other men are more specific. Pearson says the 
fundamental problem of statistics is as follows: ‘‘ An event has occurred 
p times out of n trials where we have no a priori knowledge of the fre- 
quency of the event in the total population of occurrences. What is 
the probability of its occurring r times in s trials.’”” Von Mises says the 
problem consists in finding out whether our data can be considered as a 
population, using the term population in its technical sense, and if it 
can be so considered, in discovering how it is constituted. These last 
two statements of the problem are really not far apart. 

It is not my intention to formulate another statement of the problem 
but to give a sort of outline of the process of reduction of the data to a 
few statistics. We start with a large mass of data. To make the 
language concrete, let each observation or each set of related observa- 
tions be ona card. We will have two kinds of piles of cards. In one 
pile the cards are ordered singly or in groups according to some attri- 
bute, usually time or space. This type of data leads to the problems of 
the measurement and elimination of seasonal variation, of the determi- 
nation of trends, cycles and lags, and to the question of correlation 
special to this type of data. Closely in touch with this kind of collec- 
tion of data is the theory of Lexis with its classification of statistical 
series into subnormal, normal and supernormal, and the seeming 
paradox of the antagonistic ideas of homogeneity and stability as given 
by Bortkiewicz. I shall not classify further this part of the problem 
of statistics because it happens that most of our discussions at this 
meeting are concerned with the other pile of cards. 

In this pile, the cards are arranged in a frequency distribution. 
Two general classes of problems arise, according to whether we consider 
the data to be complete, like a census report, or merely a random sample 
from a larger aggregate. In the first of these general classes we have 
the problem of describing the distribution as it is given to us by means 
of a few characteristic numbers which may or may not be parameters in 
some analytic representation which has been chosen. That is, we are 
simply trying to describe the data that has been given to us without 
making any general inferences except as we may make allowances for 
the inaccuracies of our observations. This part of mathematical sta- 
tistics is sometimes called descriptive statistics. The questions which 
arise are those concerned with the choice of the statistics to represent 





30 American Statistical Association 


the data, or with the kind of analytic representation which gives a best 
or most convenient fit, or perhaps with the testing of some hypothesis 
which has been suggested as to the law or causes of the given distribu- 
tion. Perhaps a comparison with another group of data is the problem. 

If the data given us is to be considered as a random sample from a 
larger population, a greater variety of problems arises but all goes back 
to the one question, ‘‘What are the characteristics of the population 
from which this sample was taken?” We are now in that part of the 
field of mathematical statistics in which there is the greatest activity 
at the present time. There is, first, the question of the form of the 
distribution of the hypothetical population, where our guides may be 
Pearson, Edgeworth or Charlier. Experience is usually necessary in 
our choice of form, though we are often limited by expediency and 
must depend on published tables. While empiricism permeates our 
theory through and through, we have a safety valve in the ‘“‘ goodness of 
fit” tests of the adequacy of our estimated population to represent our 
data. 

Out of all the forms of distribution which are at our command there 
is one which stands out among all others like the straight line in ele- 
mentary mathematics. This is the so-called normal or Gaussian dis- 
tribution, popularly called the probability curve. It was once con- 
sidered as adequate to satisfy the demands of the statistician, then 
considered as a first approximation, then as a very special case of a 
more general distribution, and now we often see the statement that in 
practice we rarely ever find normal distributions. Nevertheless this 
distribution is of supreme importance. We cannot dispense with it any 
more than we can dispense with the right angle because we often hear 
it stated that the right angle is rarely found in nature. The change in 
attitude toward the normal distribution is marked nowadays by such 
phrases as “the changing notion of probable error,” and by the care 
taken to qualify conclusions by the insertion of the parenthetical clause 
“assuming normal distribution.’” This brings up a problem of in- 
vestigating the effect of this so-called normal hypothesis on the values 
of statistical constants. 

After the form for the distribution has been chosen, the next problem 
is that of estimating the values of the statistics describing the popula- 
tion of which we know only a random sample. The words “estimate” 
and “estimation” used in this connection are coming to have a very 
technical meaning attached to them. R. A. Fisher defines estimation 
as follows: “Problems of estimation are those in which it is required to 
estimate the value of one or more of the population parameters from a 
random sample of the population.” 





Proceedings 31 


But the finding of a value for a statistic is not sufficient. It was 
found from the information contained in the sample at hand. Another 
sample would give a different result. If we have at our command the 
results from many samples we can find empirically something about the 
reliability of the statistic in question. It may be that some other sta- 
tistic would serve our purposes just as well. We would then have a 
problem of comparing the distributions of the two to find the one most 
reliable. 

But we have only one sample. The general problem to solve here is, 
given a population, what is the distribution of some designated pa- 
rameter calculated from a random sample. It has been solved in a few 
special cases. This problem, again, falls into two merging classes de- 
pending upon the size of the sample. The older problem, that of large 
samples, has proven inadequate in many fields where it is impossible to 
get large samples, as in engineering and agriculture. Anticipated 
normal distribution is often disappointing; so we have the rapidly in- 
creasing literature on the theory of small samples, with the names of 
“Student” and R. A. Fisher leading all the rest. Having found the 
distribution of a statistic, the problem is conceivable of finding the 
distribution of the constants of this newly found distribution. 

Each card of our pile may contain more than one piece of information 
about an individual of the population or the sample. When it does, we 
have frequency distributions in more than two dimensions leading to a 
much richer field for investigation. We have all the problems which 
we have enumerated in a more complicated variety, with the addition 
of the questions concerning the relationship, if any, existing between 
the various sets of data on our cards. This brings us to the wide field of 
correlation, much of which is virgin soil. What I said about statistical 
methodology is well illustrated in correlation. The methodological 
side has been developed until we can find correlation coefficients by 
simply turning a crank, but the explanation of the meaning of the re- 
sult after we find it, needs a brain. 

Running parallel with our discussion of statistics, there could be a 
discussion of another branch of science, the theory of probability with 
now and then a bridge connecting the two. This is one conception of 
the relation between statistics and probability. Another conception is 
that of a sort of mist through which the statistician must peer. Still 
another is that the theory of probability is the tool house for statistics. 
We have all sorts of statements, from ‘“‘the theory of probability is dead 
and should be buried” to the notion of some that probability and sta- 
tistics are one and the same thing. The most interesting part of the 
controversial discussion of the relation between statistics and prob- 





32 American Statistical Association 


ability centers about what is known as Bayes’ theorem. If an event 
has happened and it is known to have one of several causes, then under 
certain circumstances Bayes’ theorem purposes to tell us the chance 
that the event followed from some chosen cause. The problem is really 
one of finding the form of a distribution. In discussing the principles 
of theoretical statistics we cannot escape the question of their relation 
with the theory of probability and nothing is more important than the 
questions arising about this theorem. 

In the above I have tried to give a sort of background for the real 
part of our program which includes papers and discussions on frequency 
distributions, the normal hypothesis, the theory of small samples, 
correlation, the method of Lexis and Bortkiewicz and on the relation 
between statistics and probability. 





Proceedings 


THE CONCEPT AND UTILITY OF FREQUENCY 
DISTRIBUTIONS 


By Harry C. CarvER 


In preparing this paper I have been guided by instructions which 
called for “‘a very general discussion of the representation of frequency 
distributions by analytic functions—not expounding any particular 
method, but giving a bird’s-eye view of the whole field. Further, the 
paper should be for the man who has hardly ever heard of the subject 
and is likely to ask the question, ‘What is the use of doing all that?’”’ 

Frankly, I can see nothing strange in this request at the present 
time, since the vast majority of books and articles on the subject 
of frequency distribution graduation fail to provide the answer. They 
merely present an observed distribution, find an analytic function that 
affords a more or less satisfactory fit to the distribution, obtain from 
the function employed frequencies corresponding to those of the ob- 
served distribution—and keep the reason for doing all this a deep, dark 
secret. 

I might go even further, and state that there are probably not a few 
individuals who have performed a number of graduations who are still 
wondering what it is all about. They probably have observed that if 
they employed moments up to the third order in the graduation, that 
the mean, standard deviation and skewness of the graduated frequen- 
cies agree with those already obtained for the observed frequencies, 
and that if in addition the fourth moment was employed, then the 
excess was likewise unaltered, and so on. In elementary statistics 
students are taught that the mean, standard deviation, etc., are em- 
ployed to describe the distribution in question, and it is therefore only 
natural for them to wonder why a distribution should be graduated if 
these characteristics are unaltered by the process. 

In an attempt to answer this question, let me first remind you that 
data collected and presented in the form of a frequency distribution 
is, with rare exception, merely itself a single sample that should be 
regarded as withdrawn from a larger parent population. In most cases 
the parent population must be regarded as infinite, in the minority 
of cases, finite. From this one sample we attempt to infer something 
about the parent population; certainly future samples will depend 
more directly upon the parent population than upon the single sample 
already observed. It is clear, therefore, that the primary object of 
all but the rarest of statistical investigations is to gain knowledge, 





34 American Statistical Association 


as accurately as possible, concerning a more or less hypothetical parent 
population. The mean, standard deviation, the correlation coefficient 
—or whatever may be computed from the sample—can be regarded 
only as an approximation for the corresponding function for the will-o’- 
the-wisp parent population. The so-called probable error attempts 
to measure expected discrepancy between functions computed from a 
single sample, and the corresponding function for the unknown parent 
population. 

The preceding discussion may be illustrated by a simple problem: let 
us suppose there are 300,000 males in Cleveland between the ages 20 
to 45, and that we desire to obtain the average height or weight of 
these individuals, or the correlation between their height and weight. 
For practical reasons it would be almost impossible for an investigator 
to obtain the desired data for this entire group—he might collect data 
for, say, 10,000 individuals. A little reflection will recall the fact that 
his sample is but one of a huge number that he might have chosen by 
random selection; the number of different samples is equal to the num- 
ber of combinations of 300,000 things taken 10,000 at a time and con- 
tains in excess of 19,000 digits. Some of these samples would produce 
very high averages and correlations, and others would go to the other 
extreme. The inference that phenomena peculiar to the 300,000 
individuals in the parent population will agree exactly with that of a 
single random sample is untenable. It may be exceedingly important 
to determine the probability that the phenomena of the 300,000 individ- 
uals does not differ from that of the single sample by more than a 
certain per cent. 

The theory of sampling is concerned with the problem: given the 
characteristics of the parent population, required the probability that 
a single sample withdrawn at random will lie between given limits. 
The theory of probable error is the inverse of the sampling problem, 
and is concerned with the following: given the characteristics of a 
single sample, required the probability that the corresponding charac- 
teristics of the parent population will not differ from those of the 
sample by more than a given per cent. The answer to both of these 
problems requires in general that we conceive the parent population 
as one distribution, all possible samples arising from the parent popu- 
lation as another distribution, and lastly that there exist analytic func- 
tions capable of representing both distributions. For this reason it is 
very important that we know at least one function capable of repre- 
senting distributions of heights, others for weights, others for distri- 
butions peculiar to education, psychology, etc. It has been discovered 
that some distributions can be represented satisfactorily by the point 





Proceedings 35 


binomial, or its limit—the Poisson exponential. Other distributions 
appear to follow closely the so-called Normal Law, and other explicit 
functions comprising the well-known system devised by Karl Pearson. 
On the other hand, each of these functions may be regarded as but a 
first approximation to a law of frequency, and may be employed as a 
sort of generating function in an infinite series. Noteworthy contri- 
butions to this infinite series representation have been made by Gram, 
Charlier and Romanovsky. 

Comparatively little can be rightly said against the effectiveness of 
Pearson’s curves, although much opposition is justified due to the 
difficulty that one experiences in computing necessary ordinates and 
areas for all but two of these curves for which tables have been pre- 
pared. Charlier’s series, both type A and type B, are of great impor- 
tance in dealing with problems of a priori probability and sampling in 
spite of the fact that the difficulties that arise in integrating the type 
B series for the purpose of totaling frequencies between given limits 
are quite as serious as the determination of areas under Pearson’s 
curves. Any objections raised to difficulties of handling Pearson’s 
curves singly are, of course, equally significant when these same curves 
are used as generating functions in the infinite Gram-Charlier-Roman- 
ovsky series. From the point of view of both practical statistical 
application and rigorous mathematical demands, it is highly desirable 
that remainder terms for these infinite series be obtained. I believe 
that nothing has been accomplished along these lines, although con- 
siderable effort has been expended in attempts to establish the mathe- 
matical convergence of these series. However, it should be borne in 
mind that merely proving their convergence may be of relatively little 
importance so far as the practical application of these series may be 
concerned. A mathematical triumph may be achieved by proving 
a series convergent, but if several thousand terms be required in order 
that the remainder may be sufficiently small to justify the requirements 
of accuracy of determination, it can scarcely be said that a practical 
solution to the problem has been obtained. Convergence in the sense 
of pure mathematics is one thing; convergence from the standpoint of 
actual computation is quite another. 

The utility of frequency functions in efforts to condense statistical 
records should not be overlooked. The frequency distribution is itself 
an efficient device for presenting large masses of data in both a logical 
and compact form. If a satisfactory representation of such a distribu- 
tion can be obtained by utilizing any frequency function, then the 
distribution itself may be discarded and approximately reproduced at 
will by merely retaining the analytic equation. I am unable to be 





36 American Statistical Association 


greatly excited over employing graduation merely for the smoothing of 
commonplace distributions. Here is where I say, “What of it?” If 
smoothing be necessary, satisfactory methods far simpler in operation 
are available. 


DISCUSSION 


By Burton H. Camp 


I agree with what Professor Carver has said, both as to detail and as 
to emphasis. There are three main reasons why a student should be 
taught to graduate a curve. The first, and least important, has to do 
with the use of a smooth curve in place of a jagged sample. The 
second, and most important, is that it is necessary for the mathematical 
development of statistics that the mathematician should be told what 
assumptions he may make. These usually depend on the types of fre- 
quency curves which can be depended on to fit phenomena. Professor 
Carver has not mentioned, except by implication, a third reason, inter- 
mediate in importance between the other two. This is that, in testing 
a priori theories in various fields, it is often necessary to test the efficacy 
of the frequency distributions which are results of these theories. 





Proceedings 37 


THE TECHNIQUE OF GATHERING AND TABULATING 
VACANCY DATA 


By Bernarp J. NEwMAN 


A majority of the vacancy surveys that have been reviewed by the 
writer, have revealed little information of practical value. They have 
clearly demonstrated that those responsible for them have had only a 
slight appreciation of the legitimate objectives of a vacancy study. 
Simply to discover that five per cent or fifteen per cent of the family 
accommodations of a given area are unoccupied, or that twenty per 
cent of the office floor space of a city is for rent, means almost nothing. 

It is a fundamental principle in such survey work that there must be 
a broad understanding of the possible uses to be made of the desired 
data before the schedule card is drafted. Equally essential is it to 
know the field of study and the interrelated factors active therein, or 
that presumably may be thus active. I am not suggesting that a thor- 
ough investigator will advance a theory which he is determined to 
prove, although such a procedure is an ever present danger in this day 
of highly organized propaganda and promotion work. It is legitimate 
to assume a possible relationship, cause or conclusion, and provide for 
the collection of pertinent data having a bearing upon them. This, 
however, is not the method of the propagandist. He limits the scope 
of his investigation to those factors which in his judgment will prove his 
preconceived conclusion. The important point is to make an impartial 
approach to the study, and to incorporate on the schedule card all 
possibly related factors in order that the analysis will permit of authen- 
tic conclusions. The perusal of vacancy surveys shows that few in- 
vestigators have been so thorough. 

Without specifying the particular groups making the studies which 
have failed, we can refer to the types of vacancy surveys which are of 
limited value, if indeed they are not valueless. I have in mind three 
national groups that publish reports in this field. One is an industrial 
association; another, a professional group; and the third, a govern- 
mental bureau. In so far as I have been able to ascertain, and I base 
my judgments upon schedules and information which they have fre- 
quently forwarded to my office for the notation of data, they arrive at 
their conclusions from reports of local organizations or business houses 
without requiring an explanation as to how such local data have been 
collected. Their prestige as national organizations invests their state- 
ments with unwarranted authority, particularly if the local data sup- 





38 American Statistical Association 


plied to them have been assembled by police or other poorly supervised 
investigations. 

An example of such surveys comes from a large American city where 
the police canvassed vacancies in family accommodations and ex- 
pressed judgments as to whether such vacancies were in good, fair or 
bad condition, without having had an instruction sheet to unify their 
judgments. . In arriving at their rate of vacancy, they used an assess- 
or’s list of residential buildings but did not total family accommoda- 
tions in those of the apartment type, nor did they include the recently 
completed new construction, nor did they eliminate dwellings recently 
converted to non-residential uses or demolished. 

One national association assembles information received from local 
real estate offices, where vacancy figures are arrived at by comparing 
the number of vacant houses with the number of properties managed. 
Besides the likelihood of being an imperfect cross section, such figures 
often have the added error of being based on information obtained from 
sources which are, at times, interested in propaganda for, or against, an 
extended dwelling construction program. 

Where local associations, engaged in the management of properties, 
are consulted, instead of individual brokers, equally unreliable data are 
often used. Thus, an apartment house owners’ association made a 
vacancy study in an effort to increase its mortgage loans. It differ- 
entiated between old apartments and modern apartments by classifying 
in the former group all non-fireproof tenements and, in the latter 
group, all fireproof buildings. It ascertained the number of apart- 
ments in each tenement and the gross number of rooms but omitted 
data which would have enabled it to classify the number of apartments 
by number of rooms. It also ascertained the number of apartments 
vacant and the number of rooms vacant but not the number of vacant 
apartments by number of rooms. In sending forth its survey schedule 
it assured its members that the buildings need not be identified with 
the information. Manifestly, this is an incomplete schedule. The 
investigators proved the worthlessness of their findings by determining 
the percentage of vacancies in old buildings and the percentage of 
vacancies in modern buildings, and then averaged their averages. 

Even the most comprehensive study emanating from a large city had 
the error of manifestly incomplete information. In the published 
reports of this survey, classifications of vacancies were made according 
to “old law” and “new law” buildings as found in the different govern- 
mental subdivisions, and by apartment rentals and rents per room. 
Nowhere did it appear that information was obtained about the number 
or rate of vacancies by size of family accommodation. Only by assum- 





~S = = = Ff 8 we fF we le lee le elle Ol 


— 


Proceedings 39 


ing that all apartments in the lower rental range or in the ‘‘old law” 
buildings were substandard in equipment and repair was there any 
correlation of such vacancies with structural and sanitary conditions. 

Here, as in the other surveys referred to, the outstanding criticism 
of the survey technique is the limited appreciation of the fundamental 
objectives of a vacancy study. It is not my purpose to outline such 
objectives, since the needs of different communities may vary, and with 
varying needs will arise varying objectives. But let us assume that the 
banks granting building construction loans wish to know how many 
new dwellings or family accommodations can safely be financed in any 
given year, or that the builders may wish to control their output so 
that they will not glut the market, or that owners of rental houses may 
wish to regulate rents or to determine the necessity for a renovation or a 
rebuilding program. If the vacancy survey provides them with no 
more data than can be summarized by a statement of the rate of 
vacancy en mass or of the vacancy by rental range, by old and new con- 
struction, by wards or boroughs with their widely diverging national- 
ities, types and values, how can they be any better qualified to solve 
their problem after the survey is completed than before? 

It is readily conceivable that a community may have a high vacancy 
rate and yet have a housing shortage because the mass of vacancies may 
be either in areas where local conditions set up strong rental or sales 
resistance or in properties where the rental or sales range may be too 
high or too low to serve the number of families in the economic class 
for which they were designed. Such properties may be inconveniently 
located; they may be too good for their neighborhood; they may be too 
poorly equipped or too severely handicapped by inadequate transit 
or other service facilities; they may be isolated, or of a type which the 
population has not accepted, or of poor construction. If the family 
accommodations are in apartment buildings, they may be of too many 
or too few rooms. Any of several of these factors, causing a sizable 
number of vacant properties, may be due to builders’ follies and yet 
have but slight significance in evaluating the dwelling construction 
needs of the community. Unless their handicaps are recognized and 
removed, they may be ignored as a potential supply of accommodations 
in determining the market needs. Therefore, it is only by establishing 
the objectives of the survey that an inclusive schedule card may be 
drafted. 

Let us, however, assume that the objectives have been satisfactorily 
established. Next in importance are the items that can be ascertained 
from public records or the records of other surveys in allied fields. In 
this group will be population figures, income range groups denoting 





40 American Statistical Association 


buying power capacity, assessment records, zoning regulations estab- 
lishing use districts, transit layouts and like information essential to the 
establishment of rates or to gauge the influence of factors that enhance 
or retard sales or occupancy. Since these data are largely matters of 
official records and change but little from year to year, once having been 
assembled for the initial survey, they require only such attention in 
subsequent surveys as is necessary to keep them up to date. Not so 
those other items that can be supplied only by a house to house canvass, 
data concerning which vary from year to year. 

In a vacancy study now being made by the Philadelphia Housing 
Association these latter items are grouped as follows: 


PHILADELPHIA HovusING ASSOCIATION Vacant House Survey 


STREET AND NUMBER MATERIAL 
NEIGHBORHOOD: CONDITION OF STREET: Repaired 
Type: Industrial Commercial Residential 
Economic Group PREDOMINANT NATIONALITY 
Hovse—Conpit1on: Excellent .... Good.... Fair.... Bad.... 
family accommodations 
EquiPMENT: Gas Electricity Hot Water 
Steam... HotAir... Stove... Sink... Bathtub... Hot Water Tank... 


Water Closet Water Supply 
Bathroom Could one be installed 


Rent Paid by Last Tenant 
For SALE Sheriff Sign 
WaeEn Buitt 
REPAIRS OR IMPROVEMENTS: Being Made... . 


REMARKS 


In order to assure uniformity in the collection of this material by the 
several field workers, the Association has prepared detailed instructions 
defining each item and suggesting methods of approach where difficul- 
ties might arise in contacts with agents or owners, or where access to the 
property may not readily be obtained. 

It has been the writer’s experience that the greatest difficulty in the 
way of obtaining uniform judgments arises in those items where the 
personal opinion of the investigator enters, such as the classification of 
structures as “‘Excellent, Good, Fair, or Bad.”’ In connection with 
such items, besides, the field workers are required to amplify their 
opinions under ‘‘Remarks.”’ The instruction sheet also emphasizes 
the importance of supplementary information not uniform in character 
but pertinent to particular properties, by means of which exceptional 





se .5 & 6 «06S 


5 bee bee 4 of 26 6h8lUlCtéi«—C CO 


Proceedings 41 


causes of vacancies are recorded and which in turn help to determine 
whether the particular vacant property is not available for occupancy. 
Thus the property may be tied up in litigation, ownership may be un- 
known, or it may have been vacated for public or private improvements. 

The selection of representative cross sections of the community where 
field studies are to be carried on is as important as the survey card and 
instruction sheet, for it is not necessary to make an extended study of 
all properties in the city. In the several vacancy surveys conducted by 
the Philadelphia Housing Association, the total number of houses has 
varied from 20 to 25 per cent of the number of houses in the city. 
Districts should be selected so as to include a proportionate number of 
families in several economic groups and with the same ratio of white to 
Negro as prevails in the city’s population. Unless care is exercised 
in the selection of the cross section, errors will arise which will nullify 
the value of the survey. For example, if the proportion of alley 
properties is in excess of the ratio of such properties, they, being of a 
very inferior class, will create an excess of vacancies. In like manner, 
if a section of the survey field includes an area of new construction out 
of proportion to the number of such houses in the total housing accom- 
modations of the city, the resulting data will be unbalanced, since it is 
frequently two or three years before new construction is absorbed. 
Great care must be exercised, therefore, in the selection of the areas 
of study. 

In the actual field work, the time element has great importance. 
Speed is particularly valuable in the first canvass of the districts, since 
the survey should show conditions at a given time. Therefore the 
field investigators on their first trip should record only two items, viz., 
the total number of houses within the area and the total number of 
vacant houses. This information is quickly gathered and eliminates 
the possibility of the houses having been vacated or rented during the 
course of the more detailed study. 

It will be found when the remaining items of the survey are collected 
that some properties were listed as vacant which were not vacant. 
Their inclusion in the original work of the investigators does no damage 
since only verified totals are of importance. By tallying the number 
of houses in the district a control figure is obtained for the purpose of 
establishing ratios. A further check may be placed on these totals by 
comparing them with the records of the City Assessor’s office, the Fire 
Underwriters’ Atlas or the Real Estate Directory. 

No matter how efficient the field workers may be, mistakes are bound 
to creep into their records. The instruction sheet may be carelessly 
read; hasty work may result in oversights; individual prejudices may 





42 American Statistical Association 


influence judgments. It is of utmost importance, then, that the field 
supervisor check over each card promptly after it has been turned in. 
Only by so doing can the number of errors be kept to a minimum. 
There is a tendency for field workers to try to insert missing data from 
memory. In all the Philadelphia Housing Association surveys this 
practice is prevented. Instead the field worker is required to revisit 
the property in question and make notations at the site, just as he is 
expected to do when making his original notes. 

If machinery for tabulating the survey data is not available, it is 
advisable to transfer all data to master sheets. This not only simplifies 
the work of table construction but serves as a further check on the field 
force. Such data, transferred concurrently with the field work, has 
the added advantage of enabling the clerks to start table construction. 
It is important that the transcription of the data from the survey cards 
to the master sheets be checked as a further assurance against mistakes. 

The number and character of the tables to be made will be deter- 
mined largely by the objectives previously established. The principal 
tables should be set up at the time that the survey cards are being 
formulated, the purpose being to assure the inclusion of all items on 
such cards as will be needed to interpret the tables. When these 
preliminary tables have been analyzed, it will almost invariably be 
found that supplementary ones will have to be made and correlations 
brought out where present, as a further contribution to a comprehen- 
sive interpretation. It is just as easy to overdo table construction as 
to be too meagre in the analysis of the facts gathered. However, this 
caution is hardly necessary since no vacancy survey which the writer 
has reviewed has shown any noticeable inclination to overdo the 
process of analysis. 





Proceedings 43 


DEFINITION AND CLASSIFICATION OF VACANCY DATA 
FOR PURPOSES OF ANALYSIS 


By F. L. CARMICHAEL 


This paper is based upon a study made by the University of Denver 
Bureau of Business Research, at the request of the Denver Real Estate 
Exchange.! It deals with the problems encountered in the study, as 
they bear upon the general subject under consideration, and with cer- 
tain implications as to construction trends and future building require- 
ments which may be derived therefrom. While the study included 
both residential and business property, the present paper is limited to 
the residential. 

A composite picture for the city as a whole was not considered suffi- 
cient for the purposes in view. Accordingly, the city was divided into 
thirty-three districts, following a scheme employed by the Denver 
Community Chest, which makes distinctions so far as possible in terms 
of income levels and of racial characteristics. 

Because of the changes that have taken place as to construction type, 
data were desired also by construction periods. The periods chosen 
are: prior to 1901, 1901 to 1915, 1916 to 1925, 1926 to September, 
1930. 

As to residence type, the single-residence group was limited to those 
dwellings that were suitable for occupancy by one family only. Ifa 
building had been constructed as a single residence and later altered to 
accommodate two families, it was put in the two-family classification. 
In case it had been converted into three or more living units, it was 
placed among the rebuilt apartments. 

Again, if a property of the double residence type (with one party 
wall only) was equipped with three or more kitchens, it was considered 
to have three or more living units of the two-family type. Hence, 
the double residence group consists of dwellings originally constructed 
for occupancy by two families (with a wall separating the two living 
units) and still so used. 

In addition to the cases already referred to, the two-family residences 
include buildings of two stories which had a living unit on each floor. 
The distinguishing characteristic of the terrace is the fact that three 
or more living units are separated by two or more party walls. 


1 Inquiries concerning the detailed report entitled Real Estate Inventory and Market Survey of the City 
and County of Denver should be addressed to the Bureau of Business Research, University of Denver, or 
to Wesley J. Towne, Secretary, Denver Real Estate Exchange. A summary of this report appeared in 
the November, 1930, issue of the University of Denver Business Review. 





44 American Statistical Association 


Rebuilt apartments, all sizes of which were placed in a single group, 
were separated from the apartments built as such originally. The 
latter were classified by number of rooms per living unit; and the four 
preceding residence types, by number of stories in the entire building 
and by number of rooms per living unit. (Buildings having one and a 
fraction stories were combined with the two-story group; two and a 
fraction stories, with the three-story group.) 

The effect of the plan outlined is to make the two-family and re- 
built-apartment groups somewhat hybrid in character and to define 
with a certain degree of exactness the single residence, the double resi- 
dence, the terrace and the apartment (excluding the rebuilt). 

By way of segregating residences of small value from the others, 
those valued at less than $200 were placed in a sixth group. 

In the count of the number of rooms, living-rooms, dining-rooms, bed- 
rooms, kitchens, sun rooms, observatories, offices and dens were in- 
cluded; bathrooms, toilet-rooms, shower-rooms, sleeping porches, halls, 
storage-rooms, stores and breakfast nooks were excluded. The num- 
ber of kitchens served as a guide to the number of living units, one liv- 
ing unit being the residence or the part of a residence or the apartment 
having its own kitchen and suitable for occupancy by one family. 
Data for all types were compiled in terms of the number of living units. 

As to sources of the data, practically all of the basic information had 

already been recorded on “Real Estate Appraisal’ cards (one card for 
each piece of improved real estate in the city), under the direction of 
Clem W. Collins, Manager of Revenue for the City and County of Den- 
ver. These cards, reflecting the status of April 1, were made available 
to the Bureau; they were supplemented by building permit data for 
recent months as a means of bringing the inventory record down to 
September 1. From these two sources a complete inventory was com- 
piled. 
The vacancy data were obtained with the assistance of Frank L. 
Dodge, Postmaster. Each mail carrier was asked to record, on a form 
prepared for the purpose, sufficient data on each piece of vacant prop- 
erty on his route to enable the Bureau to identify it with the inventory 
record, thus insuring the same classification scheme for vacancies as 
for the inventory itself. 

The foregoing statement outlines briefly the classification scheme 
employed in the Denver study and the sources of the basic information. 
An attempt will now be made to explain somewhat sketchily how data 
so classified may serve as a basis, first, for analyzing construction 
trends and relative demand of residences of different sizes, and second, 
for estimating future building requirements. 





Proceedings 45 


For the present purpose it is necessary to note that an exceedingly 
limited number of the living units now standing were built prior to 
1881, so that “prior to 1901” may be considered as designating a 
twenty-year period. The three following periods are fifteen, ten and 
approximately five years in length. The number built per year during 
each of these periods can therefore be approximated by dividing the in- 
ventory figures by 20, 15, 10 and 5, respectively. (Since replacements 
are disregarded in this process, these averages are admittedly inexact 
and become increasingly so as the age increases. However, they are 
fairly conclusive on certain points.) 

Take single residences in the city as a whole as an example. The 
annual averages, beginning with the period prior to 1901, are: 844, 932, 
1139 and 1287, an increase of approximately 52 per cent from the first 
to the last. Similar figures for the one-story, five-room group are: 
224, 336, 421 and 530, an increase of 136 per cent; for the two-story, 





PROBABLE FuTURE LivinG UNIT SITUATION 
City oF Denver 











4 





__ 


Amon 


4 


Poeuation Scare 3.5 Tues LU Scue 












































veaes 





seven-or-eight-room group, 189, 144, 25 and 28, a decrease of 85 per 
cent. These facts, together with vacancy percentages of 3.6 for all 
single residences, 2.5 for the one-story, five-room group, and 4.8 for 
the two-story, seven-or-eight-room group, would appear to show con- 
clusively that the trend is toward the small single residence. (Similar 
comparisons may be made for the other residence types and for speci- 
fied sections of the city.) The value of such a study to one who is 
considering the erection of residential property is apparent. 





46 American Statistical Association 


To estimate the building requirements of the future certain supple- 
mentary data, known or estimated, are needed. From the United 
States population census and the present census of living units it ap- 
pears that Denver has 3.5 people per living unit. It is assumed that 
this relationship has existed in the past and will persist in the future 
—not strictly true, of course, but sufficiently so for the purpose in 
view. 

Take the next two decades as the period to be covered by the esti- 
mate. A study of Denver’s population since 1880 indicates that 
350,000 is a reasonable 1950 estimate. 

With these basic facts and noting, as above, that the census of living 
units now standing is 0 at 1880, consider the accompanying chart in 
which the population scale is 3.5 times the living unit scale. The area 
between the population curve and the 1930 census of living units is 
roughly representative of the replacement that has occurred. 

If it is assumed similarly that no residence standing in 1950 will be 
more than 50 years old, the census of living units as of that year will 
run from 0 at 1900 to 100,000 at 1950. It may be seen, therefore, that 
approximately 22,000 living units will be required for replacement 
purposes (since the curve of the 1950 census of living units has an ordi- 
nate of 60,000 at 1930 and the present census is 82,000) and 18,000 liv- 
ing units for population increase. 





Proceedings 


PROBLEMS IN ANALYZING VACANCY STATISTICS 


By Joun D. BusHNELL 


This is an opportune time to study and bring to the fore those data 
and practical studies in housing during the past ten years which give 
definite promise of application. The construction industry, with its 
broad ramifications requiring enormous quantities and varieties of 
materials, transportation, great numbers of laborers, and employment 
for the artisans, has already felt the influence of studies of this charac- 
ter. The most rudimentary gauges of the demand and supply factors 
that exist for all forms of housing have proved of inestimable value. 
Each refinement, therefore, which can be added will bring beneficial 
results to the construction industry and business in general. Due to 
the long-lived character of buildings, it is now realized that the con- 
struction industry, by its cyclical changes of demand for men and ma- 
terials, wields a great influence upon conditions in businesses of every 
character. When supported by authentic data and sane interpreta- 
tion, the construction industry is outstanding in its ability to draw on 
the future for immediate activity and business stimulation. 

These vacancy studies strike at the very root of the innumerable 


problems that exist, namely, the demand and supply situation. Such 
information, however, must be analyzed more carefully and scientifi- 
cally than ever before. 


A dozen or more factors present themselves when consideration is 
given to a better understanding of what lies behind the best procedure 
for studying vacancies and the best application of these findings. We 
rely very largely upon the value of building permits as the quantitative 
measurement for building volume in this country. The weighting of 
building costs by price indices has assisted in procuring a better gauge 
of the rate of new building. The rent for different classes of homes has 
been studied, but these studies have been confined primarily to the 
lower priced homes occupied by workers, and have approached the 
problem largely from the sociological viewpoint. More thought must 
now be given, particularly in cities that have grown very rapidly and 
are now entering a settled condition, to the economic aspects involving 
such factors as the relationships between income and rent, land cost 
and building cost, rent differentials between single family homes and 
various classes of multiples, and between different commercial loca- 
tions; also, to obsolescence, alterations and demolitions. Data on 
these factors are decidedly lacking. 





48 American Statistical Association 


Further elaboration of our population data is being accomplished 
with good results, yet this field is but fairly opened when consideration 
is given to the possibilities which intensive local analysis will bring to 
building problems. A knowledge of the character of the population in 
a community is very important. Present authentic totals are now 
available with the 1930 Census just completed, and new and improved 
methods for estimating future population totals in prescribed areas will 
enhance the value of these figures. The size of today’s family, or 
stated better, the great variation that exists in the number of persons 
in the several types of families which have been heretofore considered 
as uniform units, when related to building, demands investigation. 

The changes that have taken place in the living standards of the 
people of this country have had an effect perhaps not fully realized. 
These changes in living standards, particularly when they proceed 
with the rapidity that took place during the recent prosperity era, have 
much to do with such a problem as obsolescence; for obsolescence can 
have many degrees. 

Aside from our changes in living standards, our changes in family 
life are adding turmoil to a proper perspective of the general housing 
situation. We have the question of an increasing number of childless 
families, relatively fewer servants, a much greater part of the living 
outside the home, an enormous increase of women in industry—all these 
and other factors directly reflected in housing problems. These 
changes also greatly influence the volume and varieties of manufactured 
goods formerly produced in the home. This necessitates increasing 
retail activities, a wider range of finished goods, and many new services 
offered from the outside. Comparisons for recent years on the number 
and ratio of various retail outlets reflect these changes. 

The ownership of a home helps considerably to fix that home’s life 
and occupancy in 4 community, and if it were known which homes were 
owned and which were not owned, as field work progressed a great 
step could be made in narrowing the field where so much intensive 


study is needed. 


In Los Angeles City and environs the organization which the speaker 
represents has been conducting vacancy surveys for a period of ap- 
proximately ten years. These vacancy studies embrace all types of 
residences in the City of Los Angeles, single family homes in the out- 
lying areas, neighborhood stores and office buildings. 

Dwelling units are divided as follows: Single family dwellings on the 
front of the lot; single family dwellings on the rear of the lot; double or 
duplex houses; bungalow courts (three or more houses on the same or 





Proceedings 49 


adjoining lots); flats (three or four family capacities in one building); 
apartments (five or more family capacities in one building). Sample 
streets have been selected in all parts of the city and every effort has 
been made to make these representative. They cover the entire city 
and all types of residential sections are included. A periodic revision of 
these sample streets is undertaken and constant additions are made to 
the total area under survey. 

The base for the vacancy studies that this organization has been 
making is an actual house count of homes made in the City of Los 
Angeles in 1925. These data are brought up to date monthly by add- 
ing new building permits. Some homes are not built after a building 
permit is procured; also, the city does not keep accurate data on demo- 
litions and alterations involving family capacities. A number of 
cursory investigations have shown that added capacities through altera- 
tions have a tendency to offset unused permits and demolitions, and the 
work has proceeded upon this premise. 

To determine existing vacancies in all the types of dwellings just 
enumerated (excepting apartments) checkers travel by automobile 
along the sample streets, counting each dwelling and indicating whether 
each family capacity is occupied or vacant. Count clocks are used 
most advantageously for this work. The percentage vacancy figure 
for each type of dwelling is computed from these counts. This proce- 
dure may raise the question of taking into account demolitions and new 
buildings. Included in the check of vacant homes already constructed 
is also a check of the several types of new family capacities under con- 
struction, and a separate computation is made of the percentage to the 
total of new capacities under construction. Because each and every 
present or prospective home on the sample streets is taken into account 
with each survey, the base for making this computation is automatically 
adjusted each time, and demolitions, therefore, have little effect on the 
accuracy of the data. 

This method of determining vacancies in apartments is not possible. 
We secure vacancy data for apartments by sending questionnaires to 
over 2,200 apartment houses in the city. The average return is from 
10 to 20 per cent. These questionnaires are sent blind and are not 
coded in any way, and it has been our experience that a better reply is 
received in this manner. By means of the excellent codperation ob- 
tained through the apartment house associations, we have an auxiliary 
check to our own survey data. An independent census is taken of the 
idle apartment houses and their capacities. 

In addition to obtaining the vacancy factor for each type of dwelling, 
a computation is also made of the average vacancy that exists. These 





50 American Statistical Association 


various vacancy factors are weighted according to the relative number 
of each type of dwelling in the city. 

With the increased growth of suburban communities surrounding the 
City of Los Angeles, it has become necessary to make an annual survey 
of vacancies in these communities. Only single family dwellings are 
checked, but the survey covers some 150 miles of streets and approxi- 
mately 13,000 family capacities. Without additional time and expense 
an interesting and valuable record is obtained of the number of “ For 
Sale” signs displayed in front of these homes. A year-to-year record 
of these “‘For Sale” signs has proven a good index of the firmness with 
which these single family homes are held either by the occupant and 
resident owner or the absentee owner. 

An annual vacancy check on outlying neighborhood stores is of value. 
In the City of Los Angeles a check is made of all suburban districts by 
excluding the centralized downtown business area. This work is done 
by checkers on sample business streets and approximately 13,300 retail 
store capacities are checked. It has proven more advisable to use the 
tally system of checking in this work, yet this checking of neighborhood 
store vacancies is becoming an increasing problem. Already our ex- 
perience shows that we must increase the survey during the coming 
year and, particularly, include certain cities adjacent to the City of Los 
Angeles. 

An annual check of the vacancies that exist in office space is con- 
ducted each yearin December. Field men go through each floor of 197 
office buildings, with approximately 11 million square feet of rental 
space, and observe the vacancies. The total or average vacancy figure 
for office buildings is also computed on a weighted average basis, each 
building being weighted according to the amount of floor space. The 
results from such a survey of office buildings ought to be substantially 
correct. A vacancy, if it exists, in one or several offices is easily dis- 
cerned. The possibility of error through a sample is eliminated, as 
actual count is taken of all major office buildings and semi-loft buildings 
in the central portion of the city. 


The rapid growth in the number of apartment houses creates some 
new problems in these vacancy studies. Eliminating for the moment 
the causes for this increase in apartment houses, from a purely statis- 
tical viewpoint this trend raises several interesting questions. Recent 
data in Los Angeles show that single family homes represent approxi- 
mately 44 per cent of the family capacities, and apartments approxi- 
mately 2214 per cent of the family capacities. The vacancy at this 
time in single family homes is 1.8 per cent; the vacancy in apartments, 





Proceedings 51 


24.1 percent. With a total of approximately 192,500 family capacities 
in single family homes and 98,500 family capacities in apartments 
these percentage vacancy figures would indicate approximately 3,465 
vacant family capacities in single family homes and 23,739 vacant 
family capacities in apartments. 

The question now comes—how many persons instead of families will 
it take to fill these vacancies? Presuming a factor of 3.5 persons per 
family for all families wherever living in Los Angeles, it is known that 
you cannot apply this factor to families living exclusively in single 
family homes nor to families living exclusively in apartments. It is a 
safe conjecture that the average in single family homes is more than 4 
persons per family. It is doubtful whether the average for apartments 
would be over 2.5 persons per family. 

The absolute totals of vacant family capacities that I just spoke of 
in single family homes and apartments would indicate at first thought 
that the apartment vacancy represented nearly 7 times as great a 
capacity as the single family home vacancy. But after giving effect to 
4 persons per family in the single family homes and 2.5 persons in 
apartments, the apartment vacancy in terms of persons would be only 
about 4 times as large as for the single family homes. 

Our data indicate that nearly 10 per cent of the total population of 
Los Angeles in the summer and approximately 13.5 per cent in the 
winter months is non-permanent. For recent years the total vacancy 
in apartments has been from 13 to 28 percent. Probably a 15 per cent 
average vacancy would fairly well take care of the seasonal fluctuation, 
which is quite pronounced, in the demand for these apartment capaci- 
ties. But aside from this phase of the question a city of over one and 
one-third million persons presents internal population problems that 
are finding very direct reflection in apartment houses. With both 
members of a family working and no children, an apartment is the most 
economical and the most desirable place to live. In addition to tourists 
that come for a specified time, many thousands come to the city un- 
decided whether they will stay or not, and these persons first occupy 
apartments. With a city the size of Los Angeles, land values within 
a certain radius of the city become too high for single family homes, 
and, therefore, an apartment is necessary in order to realize upon this 
land value. Undoubtedly the building of apartments has been a 
profitable, promotional objective for certain types of financing, and for 
building material dealers, contractors and the labor representatives. 

A year-to-year comparison of vacancy data by small neighborhood 
districts gives a trend and a better knowledge of the changing charac- 
teristics of the district. The general practice is the simple counting of 








52 American Statistical Association 


store units, irrespective of size. Computations on a square foot basis 
or a measurement of the street frontage for each unit would be more 
desirable, but the cost would be prohibitive. Absolute vacancy data 
pertaining to stores without looking further is sometimes deceiving. 
In a new district there may be many vacancies but rapidly developing 
potentialities should be considered along with these high vacancy 
figures. Conversely, a long-term lease will hold a tenant even though 
he greatly wishes to be elsewhere with his business. Changing uses of 
store rooms may still hold the vacancy figures at about the same level, 
but careful observation would probably reveal definite trends towards 
either improvement or decadence. A five-year comparison of store 
vacancy data in outlying districts shows a decreased percentage to the 
total occupancy for bank buildings, material dealers, drug _ stores, 
groceries and markets, and real estate offices, but an increase for cloth- 
ing stores, dry-good stores, shoes, electrical goods, offices and restau- 
rants. 

Management, location, accommodations, rent, age and other factors 
influence the vacancy of office buildings. Location is most important. 
With competition for tenants, however, becoming keener and so many 
offices now offering good locations and all physical accommodations, 
tenants are becoming more particular than ever. One of the great 
office building problems at present is the selection of tenants. Office 
buildings are progressively taking on more specific identities. Tenants 
are demanding not only the highest character neighbors, but neighbors 
of their kind. 

The social and business habits and increasingly complex demands of 
society, from now on, must be more definitely and more extensively 
coérdinated with quantitative data and analyses. Vacancy data, 
scientifically gathered and properly assembled, will answer a part of the 
question. But to understand more fully these causes and effects, to 
increase the practicability of these studies, and to strengthen forecasts 
that must be made for the builder, investor or prospective occupant, 
the results of social, industrial and marketing studies must be more and 


more applied. 








Proceedings 53 


PRACTICAL USES OF VACANCY STATISTICS 


By H. Morton BoprisH 


The collection of vacancy statistics was first planned and undertaken 
by the least trained and scientific of American trade groups, namely, the 
real estate fraternity. Some technique for collecting statistics has 
developed and the practicableness of their rough results is attested by 
the fact that in the immediate year over seventy real estate groups in as 
many cities have collected such data. The interest and importance 
of such statistics is further attested by the ease with which the real 
estate group has enlisted the codperation of financing agencies, con- 
tractors’ organizations, the United States Post Office and others in 
collecting such statistics. 

The general collection of statistics was undertaken fitfully by housing 
groups and real estate groups some years ago, but the initial city-wide 
vacaney counts seem to have been in Los Angeles around 1922 and 
Columbus, Ohio, in 1924. We can hardly expect such a practitioners’ 
device to come of age as to procedure and comparability in so short a 
period. 

The real estate operator, builder or merchandiser suffering from the 
episodic nature of his business looks hopefully to this first effort to 
replace general judgment or impressions with quantitative data. We 
can hardly expect him to advise his customer not to build or purchase, 
but his salesmanship, through vacancy statistics, can be tempered with 
a more factual point of view. If vacancy statistics are sufficiently 
detailed, and are available to the public, the public should be able to 
avoid investment and purchase in very weak areas. 

A financing agency like a building and loan association or bank 
probably has as much practical use for vacancy statistics as anyone in a 
city. The subtle trends in regard to property values, active market 
changes and migrations both of business and individuals impair the 
ultimate security of their long-time commitments. These trends are 
seen first in vacancies. 

Of course the above points are quite as applicable to the real estate 
broker and to the public, assuming they are equally astute and inter- 
ested in- the long-time performance of real estate investments. It is 
questionable, especially in the case of the public, whether they can 
intelligently utilize such information. It seems quite probable that the 
material dealer and manufacturer of building materials can find in 
vacancy statistics some guide as to the demand for the coming year, 








54 American Statistical Association 


assuming that the surveys have been carried on for several years so 
that some semblance of trend can be seen. I know of cases where store 
location men studied vacancies with almost as much interest as they 
made traffic counts. 

Office building vacancy information should be most useful because of 
the general accuracy of such counts and the fact that gross totals are 
available, due to the concentrated location of the buildings. They are 
generally owned and rented in units which can be easily organized for 
count. Vacancy statistics are useful to both the building owner and 
the building occupant in working out space rates. A real measure of 
vacancy will assist the building manager in adjusting his rates so as to 
maintain optimum occupancy and still not be unduly moved by the 
allegations of a sharp space buyer. 

Bond houses and insurance companies are especially eager for the 
vacancy information, which would indicate that as financing agencies 
they attach great practical importance to the conditions of supply in 
the market in which they are about to finance a new venture. 

The use of vacancy statistics in appraising is important. Careful 
appraisers in valuing all types of property check the vacancies in the 
immediate neighborhood, and general vacancy statistics will not only 
facilitate this count but give them a picture of the general situation. 

It is possible that vacancy information will be useful to subdivision 
developers and city planning groups. Regrettably, subdividing is 
generally a function of general income rather than of need for town lots, 
so that their production is always quite in excess of any immediate 
needs which might be shown by vacancy statistics. City planning is 
still dealing in too broad outlines to need sensitive measures of neigh- 
borhood or business district change. 

All these comments relate to the construction industry, also. Prob- 
ably no industry produces more blindly for its market than has the 
construction industry in the past. That it is harmful for all can be 
granted. For example, there is no doubt but that a continued over- 
production in the house market destroys the normal market and pre- 
vents fair and necessary liquidation—a condition discouraging to new 
home ownership and embittering to those who have chosen this socially 
desirable form of investment. Market knowledge of this character 
should contribute to the stabilization of building operations. In so far 
as building is stabilized and as accrued information eliminates un- 
founded guesses as to actual conditions, rentals and values are stabilized. 

Judgments leading to stabilization are difficult due to the absence of 
norms. Even the practitioners, not to mention the people, have not 
yet fully appreciated the entrance of style and the rapid obsolescence 


i 
ey 
4 
4 



































Proceedings 55 


in the consumption of real estate, which will materially affect vacancy 
figures. 

Today the practical value of vacancy surveys is limited, first, by the 
absence of a standard collection technique. There are no satisfactory 
norms at present. Accuracy is another important objective which 
demands both complete inventories and a standard. classification. 
Standard classification should be quite detailed and include not only 
the types of improvement, but should check both the value and the age 
of the improvements. The classification developed by the National 
Associations of Real Estate Boards is only partially satisfactory. It 
utilizes the general concepts of urban land improvements. They are: 
single-family dwellings; flats and duplexes; apartments; ground floor 
shops and stores; offices; loft buildings; warehouses; industrial proper- 
ties; vacant properties classified according to use. An ideal classifica- 
tion for an inventory and count would include vacant lots. Vacant 
business should be measured in terms of front feet rather than numbers 
of stores. Numerous other refinements can be mentioned. The 
development of careful inventories, such as that in the recent Denver 
survey or the Oak Park survey, will provide the materials for an ac- 
curate, detailed and useful standard classification. 

The practical value of vacancy statistics in the past has been under- 
mined by the absence of inventories. Percentages mean little if the 
number of vacancies is accurate while the gross supply is determined by 
taking the realtors’ population and dividing it by 4.2, the so-called 
average family, to ascertain the number of housing units. This neces- 
sity for accurate base figures has been recognized in several of the recent 
surveys which included careful inventory of the types and number of 
units of property in existence with the resultant accuracy and com- 
parability in the percentages obtained. 

Time and frequency of surveys are devoid of uniformity. More 
frequent counts are essential. A number of successive periodical 
vacancy counts, either annually, semi-annually or quarterly, will per- 
mit the calculation of absorption rates of arather general nature. Such 
information has considerable utility to those planning new enterprises. 

Practical uses are also limited by problems of interpretation. In- 
dividuals in the real estate business do not yet know how to interpret 
vacancy statistics, nor does anyone else. Of course we are interested 
primarily in the price results of increasing vacancies. It is said that in 
residential properties a 4 per cent vacancy is normal; but the vacancy 
can go to 6 or 8 per cent before breaking the market in terms of price. 
In connection with apartments a 10 per cent vacancy is considered 
normal, and some say that 12 to 15 per cent is necessary before price 








56 American Statistical Association 


changes become evident. The extent to which real estate values are 
going to be sensitive to changed vacancy conditions remains to be 
determined. That over-supply reduces price is obvious. How much 
over-supply and how we are to regard the varying age of empty struc- 
tures is still rather uncertain. It cannot be stated too emphatically 
that vacancy surveys should be interpreted in terms of their effect upon 
the price of realty. All too often surveys have been conducted and 
those participating had no results other than a figure which they 
thought proved that their city was normal or subnormal in its vacancies. 
The pricing of realty from an investment point of view is all-important 
to investors, financing organizations, tax bodies and to all who have 
contact with realty in any way. This is repetitious, but there is so 
continually encountered a lack of appreciation of the price implication 
of vacancies that I burden these remarks with the repetition. 

The practical use of vacancy statistics multiplies from a local point 
of view when the data are districted and mapped so that weak spots, 
migrations and the like appear. The most enlightening phase of 
several recent surveys has been the presentation by map. Dots or 
percentages are used in numerous districts, showing the exact condition. 

The question of chronic vacancy has not been raised in connection 
with surveys or counts and the interpretation of them. By this I do 
not mean the normal percentage of vacancies, but what types in our 
classification are chronically vacant, and in what areas do they lie. In 
other words, a percentage or numerical count alone does not indicate 
your supply in the area if the area under examination contains a num- 
ber of structures or accommodations which are chronically vacant. 
This may be due to obsolescent location and other such reasons. 
This notion is very important. 

A major assistance which vacancy statistics should render to current 
business analysis can be tested by the question—are vacancy statistics 
in their present form of use to the forecaster or general statistician in 
describing and predicting construction trends or real conditions? I 
think we will all agree that the forecasters of construction have had to 
rely entirely upon comparisons with a normal volume of production 
and have not attempted to look at the consumption side. Seldom do 
we find reference to vacancy, overbuilding and the like in their pro- 
nouncements. The previously mentioned lack of comparability, 
established norms, regular time of count and clearing of statistics 
eliminates almost any utility to the general statistician. 

Practically all of the present statistics deal with annual or semi-an- 
nual counts, quarterly counts being undertaken by the operators of 
office buildings in larger cities. Without questioning the usefulness of 





Proceedings 57 


accurate recurring counts it appears that little attempt has been made 
to find indexes or collect statistics which would be sensitive to change 
or show month-to-month increase or decrease. 

Growing vacancies should be the real alarm signal for the individual 
buying property for investment, the builder producing for speculative 
sales, or the money lender making long-time commitments. Any 
barometer which would indicate the movement in vacancies would 
certainly forecast important movements in rental and sales prices, 
which are less apt to be indicated by the typical yearly counts. It 
seems that the present type of survey and the resulting data fail to 
answer this most important of practical needs—a month-to-month 
index of vacancy in several types of property, particularly in residential 
real estate and office space. Such a barometer would be watched with 
great interest and, if accurate, would do much to level the present over- 
production and the stagnation which characterizes the construction 
industry. 

Several years ago the author participated in a study which resulted in 
the use of a gas meter index to indicate trends in several Ohio cities. 
The series covered a period of eight years and have been watched with 
considerable interest by builders and financing agencies, although there 
is little evidence to indicate any sensitiveness of the construction 
contracts series to increasing vacancy. It seemed in the main that 
movement in the construction indexes appeared only after vacancies 
had accumulated.! Several experimental indexes based on the volume 
of rental advertising failed to show accurately the changing rental con- 
ditions. There are real possibilities in electric meter vacancies, al- 
though in some cities the records are not kept in such a way that they 
are useful. 

In conclusion, the practical uses of vacancy surveys must be ad- 
mitted to be largely local, and these surveys are valuable primarily to 
the real estate industry and financial institutions in the particular 
locality. Improved and standardized technique for periodic counts 
and the development of month-to-month indexes should be the next 
steps in increasing or multiplying the practical uses of vacancy surveys. 

The National Association of Real Estate Boards is preparing revised 
forms for making counts and inventories and will endeavor to have a 
count made in all the principal cities in the United States as of March 
first during the coming years. This is encouraging and we can expect a 
vast improvement in our vacancy statistics in the near future with 
proportionate increased practical usefulness. 


1See Supplement to the Bulletin of Business Research, Ohio State University, Vol. 2, No. 5, June, 
1927; also current issues. 





58 American Statistical Association 


SUMMARY OF THE ROUND TABLE DISCUSSION OF THE 
RELATION OF THE AMERICAN STATISTICAL ASSOCIATION 
TO INTERNATIONAL STATISTICS 


By E. Dana DurRanpb 








At this round table no formal papers were presented, but members 
present described the recent development and present situation of the 
various international movements for standardizing, extending, and 
improving economic and social statistics; and discussed informally the 
relations of the United States Government and of the American Statis- 
tical Association to these movements. 

The chairman described and discussed especially those international 
organizations established by formal treaty among governments and 
dealing with economic statistics. He referred first to the extensive 
activity of the International Institute of Agriculture since its establish- 
ment at Rome in 1905 and pointed out the recent friction which had 
developed within that organization, one result of which has been the 
withdrawal of the permanent delegate of the United States from active 
participation in the work. 

He stated that the organization set up at Brussels, under an inter- 
national treaty in which the United States did not participate, for the 
publication of export and import statistics according to a standard 
nomenclature, had virtually failed to function. The convention was 
adopted only just before the outbreak of the World War, and the War 
completely prevented it from obtaining the necessary statistical data 
from the various governments concerned. By reason of subsequent 
efforts, sponsored by the League of Nations, for a still more satisfactory 
standard nomenclature, there has been little effort to revive the Brus- 
sels organization. 

The chairman further described the activities of the various com- 
mittees of the League of Nations and the various international confer- 
ences called by the League with respect to the three fields of financial 
statistics, transportation statistics, and general economic statistics. 
He pointed out that although there is naturally a close relation between 
all fields of economic statistics, the segregation of the activities of the 
League into various sections and organizations had brought about a 
segregation, also, of its efforts with regard to improving statistical work 
in the various fields. The most important action taken by the League 
in statistical matters has been the International Conference on Eco- 
nomic Statistics, held in Geneva in 1928, to which the United States 








———— a 














Proceedings 59 


sent a delegate. This Conference drew up a convention which has been 
signed by between twenty-five and thirty nations. The convention 
was to become operative upon the ratification of the signature by ten or 
more nations, and this number had actually ratified in December, 1930, 
so that the treaty is now in force. It requires the several governments 
to collect statistics of foreign trade, agriculture, fisheries, and mining 
and manufacturing industries, prescribing a minimum amount of detail 
to be included and establishing certain standards of method and nomen- 
clature. Although these requirements as to scope and method are very 
modest as compared with the advanced statistical work of the United 
States and of certain other governments, they represent a material ad- 
vance over the practices of many other countries. The United States 
Government has not as yet signed this treaty. If it later does so, it 
will be necessary to make a few very minor modifications in the statis- 
tical practice of the government. The treaty provides for the appoint- 
ment of an Expert Committee to interpret its decisions and to take 
measures for further standardization and improvement of economic 
statistics. It is understood that an American will be appointed as a 
member of this committee, regardless of the question whether the 
United States Government signs the treaty. 

Mr. Leifur Magnusson described the statistical work of the Inter- 
national Labor Office of the League of Nations. He pointed out that 
although no international convention had been proposed obligating the 
various governments to collect statistics in the field of labor, or pre- 
scribing standard methods, various conferences held under the auspices 
of the organization had accomplished much in this direction by recom- 
mendations to the several governments. The Labor Office publishes 
useful summaries of the statistics collected in the various countries. 
It obtains directly through its own representatives statistics of wages 
in certain industries in several leading cities in each of a number of 
major countries, together with information concerning the cost of living 
in those countries. The Labor Office has recently undertaken a more 
ambitious inquiry as to cost of living, at the request of the European 
representative of Henry Ford, in connection with his project of estab- 
lishing factories or assembly plants in various foreign countries and of 
undertaking to pay wages there equal in terms of buying power to those 
paid by the Ford concern in the United States. The speaker expressed 
the opinion that results of this investigation would be extremely useful 
from a broad economic point of view. 

Dr. E. A. Goldenweiser described the movement for the international 
standardization of statistics of banking and currency, which centered in 
a conference at Paris of statisticians and economists connected with the 














American Statistical Association 





60 


leading central banks of Europe and the United States. This confer- 
ence was called at the instance of the League of Nations and was held 
in 1928. The proceedings have not been made public, but the result of 
the conference was an increased degree of comparability in the statis- 
tical and financial reports of the central banks. 

Dr. Walter F. Willcox, who is a vice-president of the International 
Institute of Statistics, discussed the problems and difficulties which 
have confronted that Institute since the World War. He pointed out 
that the development of new organizations directly sponsored by gov- 
ernments had resulted in a certain amount of uncertainty as to the 
functions of the Institute. In particular there have arisen problems as 
to its relation to the various statistical activities of the League of Na- 
tions. The questions involved relate both to the activities of the Per- 
manent Office of the Institute in collecting and publishing comparative 
international statistics, and to the function of the Institute as regards 
the recommending of standard statistical practices. 

The speaker pointed out that these problems had been the subject of 
a number of published papers by members of the Institute, and of dis- 
cussion at its recent meetings. At the meeting in Warsaw in 1929, a 
special committee was set up to consider the revision of the constitution 
of the Institute, under the chairmanship of Dr. G. Jahn, of Norway; 
Mr. Durand is the American member of this committee. The com- 
mittee is expected to report at the meeting of the Institute in the au- 
tumn of 1931, and Dr. Willcox expressed the hope that as many as 
possible of the American members of the Institute would attend that 
meeting and contribute to the discussion of these difficult problems. 

There was also discussion led by Dr. Willcox as to the desirability and 
practicability of holding a meeting of the International Institute of Sta- 
tistics in the United States. This subject was further considered at a 
luncheon meeting of the American members of the Institute held on 
the same date. 














a Ne a IR sem: Aa 











Proceedings 


A SIMPLE THEORY OF ECONOMIC CRISES 


By Grirrita C. Evans 


By its very nature, the theoretical economics of the past has tended 
to leave in obscurity some of the most important practical problems. 
It is a fault of its metaphysical point of view, and its failure has been 
accentuated in the dominant mathematical school, which emphasizes 
this metaphysical aspect—assuming that theory is for the sake of 
theory rather than concrete prediction, and discounting the value of 
the practical test. In particular, the fact of lack of equilibrium in 
economic systems continually, and practically, stares us in the face; 
yet the principal discussion from a theoretical point of view has been 
of equilibrium, and thus at one stroke has eliminated a major issue. 

To take the simplest possible case, consider a single commodity and 
suppose that the number of units y of it, which would be bought in 
unit time, at price p, to be given by a linear function 


(1) y=ap+b, a<0, b>0, 


and suppose that the number of units u of it, which would be offered in 
unit time on the market, is given also by a linear function 


(2) u=rpt+s, r>0, s<0. 


One proceeds naturally to say that the condition that there should be 
equilibrium is that offer and demand should be equal, ie., u=y, 
whence 





ap+b=rp+s 
and 
b—s rb—as 
(3 =: = = . 
3) Pra’ eds r—a 


The graph of the functions (1), (2) reveals two straight lines inter- 
secting at the point of equilibrium. Its abscissa is p and its ordinate 
isu=y. If one investigates the process of arriving at this point, he is 
inclined to dismiss the matter with the remark that if the price is too 
high the offer will exceed the demand, according to the chart, and the 
price will therefore fall, moving towards the point of equilibrium. 
Similarly the price will rise if y>u. 

But the conclusion that the price reaches p, or even tends toward p, 
demands further consideration. Thus the two processes in the ac- 
companying charts illustrate changes of price by means of which offer 
and demand are successively satisfied, according to (1) and (2). 












American Statistical Association 

















CHART I CHART II 

| O 
pn 

DEMAND 
L p D 
CHART III* CHART IV 
O D 
in 1 O 
2 
Es D l 
ce 











* Here at (1) offer exceeds demand. Price is reduced until demand is sufficient at (2). But then offer 
is reduced according to this price, at (3). Then price is increased to amount at which this quantity will 
just be sold, etc. 

In Chart II the price tends toward p, oscillating about it; but in 
Charts III and IV, the price oscillates about p, yet gets further and 
further from it. 

The investigation of such processes is thus seen to be obscured by 
the kind of relation which one assumes to be typical. It is usually 
assumed that such relations—however general—can be written in 
terms of quantities which are all given at the same time. And the 
hypothesis does not seem strange if attention is focused merely on the 
conditions which are likely to hold when prices are constant; neverthe- 
less, in assuming that the same kind of relation holds when prices are 
not in equilibrium, one makes a simplifying assumption which has 
hardly any basis in fact. The figures which have just been drawn 
represent particular assumptions with regard to the processes which 
may occur while prices are varying. 

Different sorts of hypotheses are possible. Thus, in monopoly pro- 
duction, the accompanying charts have no application since there is no 
offer as an independent function of the price; yet the production may be 
kept equal to the demand since the demand is known. If the profit, 
which is the selling value up minus the cost g(u), is to be made a maxi- 
mum, there is determined a definite equilibrium price (the “Cournot” 





Ss alam 




















ich 


ro- 


fit, 
\xi- 





a al 
ee Serer ae 





eS ioe: yaa 2 3 





Proceedings 63 


price) provided that the demand, y =u, is a proper function of the price, 
(y=f(p)). This last assumption may be justifiable if the price is 
changing slowly enough, and if the prices of other articles and their 
consumptions are not changing in such a way as to affect the signifi- 
cance of the total income of the consumer. But as the system ap- 
proaches a crisis, prices are changing neither slowly nor singly. 
However, in order to separate the difficulties, let us consider briefly 
two cases: first, where one price alone is changing, but changing rapidly 
enough for the effect of change as motion to appear; second, where the 
prices are changing slowly but are all changing in a codrdinated fashion. 
In the first case we may write the demand as depending on the 
rapidity of change, and take, to a first approximation, y= i( ‘P). 
The simplest hypothesis that we can make is that the relation is 
linear, that is, at time ¢, 
y(t) =ap(t) -+6+n RO 


It is natural to take h>0O, since demand is greater, in general, if prices 
are going up than if they are going down. If we take the cost function 
also in simple form qg(u) = Au?+Bu-+C, the profit per unit time is given 
by the formula 


r(p,p’) =pu—q(u) =p(ap+b+hp’) —A(ap+b+hp’)? 
— B(ap+b+hp’)—C. 


We may now imagine that starting at a time ¢) with a given price po, 
the producer tries to make his total profit a maximum from then on, 
say until a second time ¢;. It turns out that the problem is easily 
solvable, and if the interval t,;—t is short enough, p may be determined 
as a function of t;! that is, the best varying price may be given for all 
times between f) and ¢;. The problem is to make the quantity 


ti 
r= | r(p,p')dt, given p (to) = Po, 


to 


=ap(t)+b+hp'(t). 


a maximum, by a proper choice of p as a function of t. 

This result may be described as indicative of a ‘‘stable” situation. 
But it is valid only as long as t; —to is less than a certain interval of time. 
If the producer is foresighted unduly, and estimates on too long a 
period of time, the result is quite the reverse. In fact, it turns out, 
according to the mathematical analysis of the problem, that if t;—to is 
greater than a certain interval 7’, which depends merely on the con- 
stants a, b, h, A, B, and not on po, no matter what function p(t) is 


1G. C. Evans, Mathematical Introduction to Economics, New York, 1930, Chapter 14. 

















American Statistical Association 





64 


taken in (to, ¢;), another one may be chosen so as to yield a greater 
profit.' There is no maximum z. The situation then cannot remain 
within the bounds for which the hypotheses in regard to g and y are 
applicable. 

A situation which makes a change of hypothesis necessary may be 
described as a “‘crisis’”’ (in this case, for the particular commodity). 
The definition is of course abstract. But it corresponds to the fact 
that as a system approaches a crisis, the laws heretofore applicable 
cease to be so, and the system, as originally postulated or described, 
breaks down. 

It may be noticed that if the price is held constant, the effect of the 
hp’(t) does not appear, and the only constant price which will satisfy 
the conditions is the Cournot price of equilibrium. The study of 
equilibrium thus tends precisely to eliminate the consideration of the 
possibilities which are most significant when the prices are changing. 
The failure of stability, for ati interval of time sufficiently long, is not 
an accidental result, consequent of the particular hypotheses of linear 
demand and quadratic cost functions, but appears also in the non- 
special case of this class of problem—the problem of the Calculus of 
Variations in which there is a variable end point. In other words, it 
appears from this kind of analysis that the stable situation is likely to 
be the exception in the economics of moving prices, rather than the 
crisis.” 

We consider now, also briefly, an instance of the second kind of spe- 
cialized situation, viz., where prices are changing so slowly that we 
need not consider the effect of the rate of change of price dp/dt, but 
where all the prices are changing in a coérdinated manner. In order 
to make this investigation we shall relate the various prices to the 
price index. 

There are, of course, many formulae for price indices, all meant to 
portray about the same relations. Thus if we consider two times fo, t; 
(or two short intervals of time about each of these times), we some- 
times write 


P(t) _ =p(h)u(to) we P(t) _=P(h)u(h) 
P(to) Zplto)u(to)  Plto) Tplto)u(ts) 








or even some more complicated mean of prices or price ratios. We 
may or may not put P(t))=1. The two forms just given, and many 


1 Evans, loc. cit., pp. 152-153. 

? The above analysis may be regarded as particularly applicable to speculation in the stock market. 
The régime is not usually one of monopoly, but of a limited kind of competition, and to it the same gen- 
eral remarks apply. 






























Proceedings 65 


others, may be regarded as approximate calculations of an instantaneous 
price index. In fact, from the first of the above formulae, 


P(t) —P(to) _ Zu(te) { (4) — p(to) } 
P(ts) Eu (to)p (to) 


and if t, is close to to, writing ¢ for to, 
(4) dP(t) _Zu(t)dp(t) 
P(t) ult)p(e) 


This analysis is not intended to be a deduction of the formula for the 
instantaneous price index. Indeed, its deduction is much simpler 
and rests on fewer hypotheses than any of the approximate forms given 
previously. The formula (4) is due to Divisia.’ Corresponding to it 
he deduces also the correlative instantaneous trade index 


dU (t) _ =p(t)du(t) 
U(t) ult)p(t) 


These formulae enable us to coédrdinate prices throughout a system 
of economics and discuss their average motion. If we restrict our- 
selves to slow motion, we may consider the situation as one of moving 
equilibria—to use a term of H. L. Moore’s—and taking the simplest 
possible case, return to the simultaneous equations (1) and (2) as 
determining the price and amount produced (given by (3)) of a sample 
commodity. But we consider the quantities a, 6, r, s no longer as 
constants but as varying with the time in some accordance with the 
motion of prices as a whole. 

We shall have then, as differentials of quotients, 

dp(t) = (r—a)d(b—s)— a oo A 
(6) (r—a) 
(r—a)d(rb—as) — (rb—as)d(r—a) 
du(t) = 
(r—a)? 











(5) 








and we may make hypotheses about the quantities da, db, dr, ds with 
relation to the price index. It is reasonable, however, to expect 
that there is a lag in the effect of prices as a whole, as indeed there 
seems to be in many economic phenomena, and therefore to relate the 
values of da(t), db(t), . . . at a time ¢ with the price index at a previous 
time t_—T. We may, if we prefer, relate them to the rate of interest 
i(t) at the time ¢, and this rate of interest to the price index P(t—T) at 
the previous time t— 7’; this latter form of relation is that suggested by 


1 F. Divisia, Economique Rationnelle, Paris, 1928, p. 268; see also Evans, loc. cit., pp. 101-103. 














66 American Statistical Association 





Professor Irving Fisher’s theory of the motion of prices as depending 
on a rate of interest which lags behind its proper value in terms of 
purchasing power. 

However, in order to make the simplest possible hypothesis, let us 
suppose that 


, 
7 t) =——____ 

(7) al)=5 G-T) 

where a is some negative constant, and retain the b(t), r(t), s(t) as 

mere constants, 5, r, s, with 6>0,7r>0,s<0.! In the hypothesis about 

a(t) we state merely that the effect of a price depends on its ratio to the 

price index, that is, that the demand is of the form 

















i. Se.» 
yO) =a5G_ pt? 
Equations (6) then take the form 
_ (b—3s)da(t) _ —a(b—3) = 
(8) $00) = ee ee 
ron} 
_.,,.  _—aF(b—3) 7 
dul) =r 6p) = aG— 7) —a)*" T), 
with 
ao (b—s)P(t—T) _ ToP(t— T) —as 


Equations (8) and (9) may be regarded as those for a typical com- 
modity. It may be noticed that, on account of the signs attributed 
to the constants a, b, . . . , the coefficients of dP(t—T') in (8) are all 
positive. In (9), p(t) is obviously positive, and u(t) has to be positive 
and therefore 7bP(t— 7’) —@s>0 in order that the solution have refer- 
ence to practical affairs; at any rate, u(t)>0. 

If we treat all the other commodities of the system in the same way, 
supposing that they satisfy similar hypotheses, we may substitute the 
results in the formulae (4), (5), and obtain equations for the price 
indices. Without calculating the explicit values of these expressions 
we notice that they are of the form ! 


(10) dP(t)=F(P(t—T))dP(t—T), dU(t) =G(P(t—T))dP(t—T), 


1 Evans, Joc. cit., p. 108. 











— 


er wae 














SFE oe gee reer = 





; 
( 
3 
‘ 








Proceedings 67 


where F and G are two certain functions of P(t— 7), which are positive 
and do not approach zero, in any case, as long as the problem remains 
practical. But these equations determine dU(t) and dP(t) in terms of 
quantities given at the earlier time t—7'; and dU(t) and dP(t) both 
have the same sign as dP(t—T). 

If we know P(¢) in an interval of time of length 7, say from fo to 
to+T7, then (10) yield values of dP(t) and dU(t), and, by integration, 
of P(t) and U(#), in the interval from t.+7 to to+2T7, and thus suc- 
cessively at all later times, to+27' to to+37, ... But, further, on 
account of the algebraic signs of F and G, if P(t) happens to be an in- 
creasing function of ¢ in the interval t) to to+7, both P(t) and U(t) 
remain increasing functions thereafter; for dP(¢) and dU(t) are of the 
same algebraic sign as dP(t—7T). We have thus the ascending move- 
ment of the economic cycle, both trade and prices increasing at the 
same time, and the movement must continue until prices pass beyond 
the point where the hypotheses are tenable, however high that may be. 
The retrograde movement, once under way, has a similar character of 
permanence, through the same equations (10), until, say, the produc- 
tions come close to the lowest possible values, or the hypotheses fail 
in some other way. Thus we have the characteristic processes which 
lead to crises. 

One might suggest a similar hypothesis about r(¢), namely, 
r(t) a Ty’ on the ground that offer would perhaps be affected in the 
same way as demand. In fact, the situation, analyzed with this 
additional hypothesis, retains its behavior as far as P(t) is concerned. 
The dU(t), however, is zero, since du(t) is zero (see (3)) in this case; 
hence U(t) remains constant. 

In the crises so far discussed, the cyclical movement has not been 
explicitly a part; the characteristic feature has been an instability 
which leads toward an extreme position where the customary proper- 
ties of the system fail to be preserved. These extreme positions are 
what we call crises. On the other hand, cyclical movements may 
occur in a system without a crisis—i.e., without the necessity of chang- 
ing hypotheses. Such movements need not reproduce themselves 
exactly during each successive period, for such a situation would ob- 
viously be a trivial special case, not realized in actual events. How- 
ever, it may be interesting to note that oscillatory movements may be 
obtained by simple hypotheses like those just given, in which price 
and trade indices preserve the character typical of what is usually 
called an economic cycle. That is, both increase at the same time or 
decrease at the same time. 

















68 American Statistical Association 





We write for instance, 





a b 

a) =5G_—7 =pa-m 
or ae 
_ Gp(t)+b 
WO = DE) 


without changing r(t)=7, s(t)=8. In this case, we find again from 
(6), du(t)=7dp(t), so that price and trade move in the same sense at 
any given time, but 

(by —a8) 
7P(t—T) —a} 





dp(t) = —7 ,aP(t—T), 


so that 
dP(t) = —F,(P(t—T))dP(t—T), 

where —F; is negative. Hence if P(t) increases from to to to+T, it 
decreases, and U(t) also, from tp +7’ to to +27; and thus the P(t) and 
U(t) alternately increase and decrease, together, in successive periods. 

We have not attempted to give any complete account of economic 
crises and cycles. Indeed, we began by considering various effects 
only one at a time, in order to separate our difficulties, when they 
obviously operate simultaneously. Also, there are subsidiary phe- 
nomena whose influence we have not taken account of—such, for 
instance, as the effect of advertising.’ Nevertheless, it is hoped that 
the results just given will be found significant. 


DISCUSSION 


By Henry ScuHuytz 


It is a notorious fact that investigators of business cycles have made 
little or no use of economic theory. Working in the field in which lack 
of equilibrium is the prevailing condition rather than the exception, and 
finding that the principal treatises on economics center around the 
notion of static equilibrium, they have refused to make their own 
theoretical ascensions in the captive balloon of the received theories. 

And yet it is generally agreed that business cycle phenomena should 
not be divorced from general economic theory. Certainly the better 
understanding which Professor W. C. Mitchell has given us of the 
actual behavior of business cycles, the vast accumulation of statistical 


1 Evans, loc. cit.,p.12. If an increase of profit is possible with a given expense of advertising 2, it will 
be increased more and more rapidly with an increase of that expense. 

















Proceedings 69 


data, and the improvement in the statistical technique which have 
taken place in the last two decades should be a challenge to all thinking 
men to put an end to this cleavage between our theories and the facts of 
economic life. 

In 1914, one year after the publication of the first edition of Mitchell’s 
Business Cycles, Professor Josef Schumpeter of Bonn presented a ten- 
tative theory of dynamic economics (Theory of Economic Development) 
in which business cycles became an integral part of his general theory. 
Others followed. Today we have explanations of business cycles 
which stress the non-rational behavior of human beings and the anarchy 
of production; explanations which emphasize the lag of one group of 
economic changes behind another; explanations which rely on factors 
outside the economic system such as climate; and many others. 

Professor Evans’ theory of crises, whici: is in a sense also a theory of 
business cycles, differs from these in the following respects: (1) It is 
mathematical and quantitative. (2) It is based on dynamic functions 
of demand and supply. (3) It can be tested statistically. 

(1) By developing a mathematical theory, Professor Evans has 
gained advantages on which we need not dwell before an audience of 
this type. 

(2) By introducing the rate of change of price with respect to time 
into his demand and supply functions 


dp\t) 
dt (1) 


u(t) =rp+s+a2, 


y(t) =ap(t)+b+h 


where a<0, b>0, h>0;r>0, s<0, d>0, he has made his problem one 
in economic dynamics, not in economic statistics. 

(3) By carrying out the implications of his assumptions, he has de- 
duced equations which describe the behavior of prices and quantities 
through time and has classified them into significant types. 

Working with the same dynamic equations of demand and supply, 
Professor C. F. Roos, a pupil of Professor Evans, has deduced an inter- 
esting set of equations descriptive of price and production fluctuations 
under various conditions.'! As some of these equations contain bcth 
exponential and periodic terms, they should be of interest to students of 
economic trends and cycles. 

Is Evans’ theory a reasonable theory? Is it in agreement with the 
facts? On this question the following observations suggest themselves. 


1“‘A Mathematical Theory of Price and Production Fluctuations and Economic Crises,"’ Journal of 
Political Economy, Vol. 38, No. 5, October, 1930, pp. 501-522. 








70 American Statistical Association 


(1) To fit the price-time and the quantity-time curves which are de- 
duced from his theory to the observed fluctuations of prices and quan- 
tities through time is not a good test of his theory, for there may be 
many other demand and supply functions which by their shifting 
would give rise to the same curves of prices and quantities. 

(2) Probably the fairest test of Evans’ theory is to see whether his 
demand and supply functions, which constitute the cornerstone of his 
theory, are in better agreement with the observations than are other 
demand and supply curves which can be used. I find that for such 
consumers’ goods as corn, hay, wheat, sugar, potatoes, oats, barley, 
rice, rye and buckwheat, the demand is quite independent of the time- 
rate of change of price, so that Evans’ assumption is not applicable to 
this class of commodities. (This statement is based on an analysis of 
the consumption and prices of these commodities from 1875 to date.) 
For most of these commodities I find that if we confine our attention 
to a period not exceeding, say, twenty years, the demand curve may 
be represented excellently either by 


y=a+bp+et (3) 


or by 
y=ap*e* (4) 


where y equals consumption, p equals price and ¢ equals time. Thus, 
using the per capita consumption of sugar in pounds (y) and the deflated 
wholesale price of sugar in cents at New York (p) for 1896-1914, I find 


that 


y= 117.5p- 0.2717 +0.0844) e (0.0124+0.0022t) (5) 


the origin of ¢ being at 1905. The standard errors of the parameters 
[which are also given in (5)] are relatively small. The coefficient of 
multiple correlation [corrected for the number of parameters in (5)] 
is R’y.»:=0.97. The quadratic mean error (e) is 2.7 per cent; that 
is, if the distribution of the points about the plane (5) be normal, ap- 
proximately 68 per cent of them fall within +2.7 per cent of the plane. 
The coefficient of elasticity of demand is 


=a=—0.27+0.08, 
y 


or an increase in the price of sugar of 1 per cent is, on the average, asso- 
ciated with a decrease in consumption of 0.27 of 1 per cent. 

The demand curve, however, does not remain fixed. It shifts its 
position upward and to the right at the average rate of 1.2 per cent 
per annum, for 


ed = 0.0124 + 0.0022. 


















Proceedings 71 


When extrapolated beyond 1914, this function (5) gives remarkably 
good forecasts of the per capita consumption, showing that the routine 
of change in demand which prevailed between 1896 and 1914 still holds. 

The demand for each of the other commodities is also excellently 
described by (3) or (4). With the exception of buckwheat, the lowest 
correlation between the computed and the observed consumption of 
the nine commodities is 0.81. 

Evans’ formula can, therefore, hardly be an improvement over (3) 
or (4) as long as we confine ourselves to the representative consumers’ 
goods and to such “‘normal”’ periods as that of 1896-1914. It may 
perhaps account for some of the residuals obtained, but better results 
will probably be obtained by introducing such additional variables as 
the prices of competing goods, etc. 

(3) On the applicability of Evans’ demand function to producers’ 
goods, I am not prepared to speak with much confidence because I have 
not had much experience with it. The fundamental difficulty with the 
analysis of producers’ goods is that no one has, as yet, succeeded in 
getting a negatively-sloping curve from the statistics of the sales and 
prices of any of these goods. All of the well-known procedures, such 
as multiple correlation, trend ratios, link relatives, etc., yield positively 
sloping curves when applied to producers’ goods. One would suppose, 
therefore, that Evans’ demand function is ideally suited to represent 
the demand for producers’ goods since it makes the quantity demanded 
a function not only of price but also of the rate of change of price, thus 
permitting demand to increase with price. Using the first differences 


of prices for ? I fitted this function to the monthly production and 


prices of pig iron for the three periods, (1) December, 1898, to Decem- 
ber, 1899, (2) December, 1901, to November, 1902, and (3) March to 
December, 1912, in each of which the movements of prices and produc- 
tion are positively correlated with one another. The results are dis- 
appointing. In no case was the positive correlation between prices 
and quantities changed to a negative correlation through the introduc- 
tion of the additional variable oe =" 

Of course, it may be argued that what I have obtained is Evans’ 
supply function, not his demand function. But then we must be told 
under what circumstances equation (1) may be expected to give better 
results than those yielded by some such empirical curves as (3) or (4). 

Furthermore, it is fair to raise the question whether in a dynamic 
law of demand the slope of the demand function is always negative 
(a<0). Only a few weeks ago we read in the newspapers and in the 








72 American Statistical Association 


technical journals that several subsidiaries of the United States Steel 
Corporation raised the price of steel with the expressed desire of stimu- 
lating demand thereby. Does this not suggest a positive demand func- 
tion for pig iron—at least within certain limits? 

In short, Evans’ equation is probably not applicable to consumers’ 
goods. The demand for these goods is normally independent of the 
time-rate of change of price. Whether it is applicable to producers’ 
goods in its simplest form or in a modified form, must be determined by 
inductive investigations. Perhaps the conclusion that will emerge 
from these investigations is that the demand curves for producers’ 
goods must be assumed: that they cannot be deduced statistically. 
In any event, Professor Evans’ researches will have thrown light on a 
very dark corner in economic theory and will have blazed a path for 
mathematicians, economists and statisticians. 














phd ie. is ae are 


sore! 


So, ee ee 


IY thei ea 











Proceedings 73 


A METHOD OF DECOMPOSING AN EMPIRICAL SERIES 
INTO ITS CYCLICAL AND PROGRESSIVE COMPONENTS 


By Raanak FRIscH 


In the last decade’s intensive study of all sorts of social and economic 
time series, it has become clear, it seems to me, that the usual time 
series technique is not quite adequate for the purpose which the social 
investigator is pursuing. The technique which is now most in vogue 
does not seem powerful enough to deal with the more complicated 
situations which arise when the time series studied represents an inter- 
ference phenomenon between several components: short cycles, long 
cycles, different orders of trends, etc., and when, furthermore, the cycli- 
cal or progressive characteristics of these various components are 
changing. 

Of course, in any time series there are always certain intrinsic fea- 
tures (the relative importance of the erratic element, the degree of 
complexity in the nature of the underlying components, etc.) which 
put an absolute, so to speak “‘natural,’”’ limit to the amount of signifi- 
cant information which it is possible to obtain from the series. But 
although no omnipotent technique can be constructed, yet the tech- 
nique is not a matter of small importance. If the method of analysis 
is inadequate we may be forced to give in long before the natural 
limit of significant information is reached. And I believe that in this 
respect there is still room for considerable progress over the orthodox 
time series methods. 

The present paper summarizes very briefly some points of an at- 
tempt I have engaged upon to push the technique in this field a little 
step forward. The present statement will give nothing more than a 
general discussion of the nature of the problem and some hints at the 
character of the tools I am using. The whole subject will be discussed 
in more detail, and a series of numerical applications given in a mono- 
graph shortly to be published. A preliminary statement of my ap- 
proach to the problem was mimeographed in April, 1927, and through 
the courtesy of Professor Wesley C. Mitchell and the Rockefeller Insti- 
tution circulated to a list of economists and statisticians. Subsequently 
the theory was considerably generalized and a condensed statement of 
it was published in the Skandinavisk Aktuarietidskrift, 1928. The 
principles developed in this paper have formed the basis of an exten- 
sive numerical work on actual and on “manufactured” series which 
has been going on in my seminar at Yale the last semester. During 
this work considerable improvements in detail and in the practical 





OS neo 













































American Statistical Association 





74 


adaptation have, of course, been made, but there has been found no 
ground for a modification of the basic principles involved. 

In the analysis of time series we may, roughly speaking, consider 
the following four groups of problems: 

(1) The decomposition of a given time series. We want to find out 
on more or less empirical grounds what is actually present in the series 
at hand, that is to say, what sort of components the series contains. 

(2) The comparison between different series. We want to compare 
a certain component in one series with the corresponding component in 
another series, or more generally, we want to compare a set of com- 
ponents in one series with certain components in other series. 

(3) The explanation problem. When we have found that a given 
series contains certain components, we ask the further question: How 
did these things come into the series? In a sense, this is the crucial ques- 
tion of time series analysis. It is only the answer to this question that 
can give the ultimate significance test for the observed components. 
But answering such a question means working out a whole rational 
explanation of the phenomenon at hand. This is not a question of 
time series technique any more, but a question of the whole content 
of the theory of the particular phenomenon at hand. Even before 
such a theory is worked out it may be fruitful to try to give some an- 
swer to the simpler question: what is actually present in the series? 
Time and again we have seen in the natural sciences, as well as in the 
social sciences, that such a more or less empirical attack on the prob- 
lem has suggested a new and fruitful starting point for the theoretical 
research. This simpler, technical question is the decomposition prob- 
lem, listed above as problem No. 1. 

(4) Finally, we have the problem of forecasting, which, if it shall 
have anything to do with science at all, must be based on a thorough 
understanding of all the foregoing three questions. 

It is only the first of these four problems, namely, the decomposi- 
tion problem, that shall be considered in the present paper. So far 
I have purposely avoided any attempt to enter into the field of fore- 
casting. I believe that no systematic and reliable forecasting will be 
possible until we have obtained more knowledge about the real nature 
of all the various sorts of cycles and trends whose composite effect 

is shown in our time series. 

In my approach to the decomposition problem there are in particu- 
lar two things I have had in view. First, I have wanted to develop a 
method that is more flexible than the usual methods of curve fitting. 
To take an example: a compound interest curve, a polynomial or some 
other specific formula fitted as a long time trend to an economic time 
series, will, as a rule, give a good fit for a while and then it will shoot 





ie eo vey: ear 


tn ie rate 





Elanianin 


ee 


thee 











1, nial et, oo Recon 











Proceedings 74 


entirely out of the data. In such a case it is customary to speak of a 
“break in the trend.’”’ To me, most of these cases are rather examples 
of a breakdown in the trend method than a real break in the trend of 
the data. It is true that there do occur cases where a real break in 
trend takes place, but these cases are extremely rare. In most cases 
the so-called break in trend is only an apparent thing due to the arti- 
ficial rigidity of the method. What we need in order to take care of 
situations of this kind is a method which has the same sort of flexibility 
features as a MOving average, but which is more refined, and further- 
more constructed so as to deal with the complicated situations that 
arise when our time series represents the composite effect of several 
components. 

Second, comparing our desiderata with what we obtain by the peri- 
odogram analysis, I have wanted to develop a method which is such 
that it will actually give components that we can see. In the periodo- 
gram analysis each cycle is represented by a sort of index number, 
namely, the magnitude of the Fourier coefficient of the corresponding 
trial period. If this index is high, we take it as indicating that this 
sort of cycle is strongly present, and if the index is small, we consider 
it as indicating a lack of evidence of the presence of this sort of cycle. 
Of course, we may also by the classical harmonic analysis procedure 
determine the phase of the cycle in question and from a trigonometric 
table plot a sine curve with the specified properties. But this is not 
tracing the component in the sense which I have in mind. I am think- 
ing of a procedure which would make it possible to trace a given com- 
ponent in its actual historical course so that we can compare a given 
historical swing in the component in question with the next swing in 
the same component. In many sorts of data, and particularly in 
economic data, it is quite obvious that the cyclical character of a given 
component is not constant. We may, for instance, have a component 
that is changing with respect to the length of the period. In the work 
done in my seminar at Yale we found, for instance, a characteristic 
lengthening in the zero distance of the “40 month’’-cycle in the be- 
ginning of the war period. This is very reasonable, I think, on a pri- 
ori grounds: the unsettlement occurring with the beginning of the War, 
moratoria, etc., made the business men adopt a policy of “wait and 
see’’ which must necessarily have lengthened the zero distance in 
question. In other cases we may have a change in the phase of the 
cycle or a change in the amplitude, and so on. And these things are 
exactly some of the most interesting and fundamental things to study. 
But, it is impossible to modify the periodogram analysis into a 
truly moving method that would permit such a continuous historical 
observation of each component, because the periodogram gives sig- 

































A ES Oe 


A EE ES PE SEA OT 









ee 












76 American Statistical Association 


nificant information only when the range for which it is constructed 
covers a great number of the cycles in question.’ 

A method by which it shall be possible to trace the historical course 
of each component must necessarily to a large extent be based on the 
local properties of the given curve instead of on its total properties. 
The nature of the cycles and the trends and other components will 
have to be determined in each point of time primarily by taking ac- 
count of the properties of the curve in the vicinity of this point, 
that is to say, by taking account of the slope, convexity, etc. Of 
course, in an actual case it is impossible to carry this principle to the 
extreme. We must make a compromise and seek our information not 
only in the strictly infinitesimal vicinity of the point considered, but 
also seek it at some short distance from this point. We have to adopt 
a practical interpretation of the notion of “‘locality.”’ For instance, 
we have to revert to finite differences instead of true differentials. And 
if the erratic element is heavy, we have to perform a mechanical 
smoothing, or in some other equivalent way modify the operation per- 
formed on the series so as to make it extend over a certain length of 
time. In this respect the operations developed below are perfectly 
general. According to the nature of the problem at hand they may 
be squeezed in on a short interval of time or extended to a longer inter- 
val. They may, for instance, be extended over a very long interval 
and constructed in such a way as to give the same sort of information 
that the periodogram analysis can give. 

The principal tools used in my approach to the decomposition prob- 
lem are linear operations. By a linear operation in this connection I 
simply mean a moving total with constant (positive or negative) 
weights. If u(t) is a function of time which is known in a set of 
equidistant points, and if 2, 2. . . . Qn are constant weights inde- 
pendent of time, the linear operation © is defined by 


Qu(t) = Soin: _ *)s) 


i=l 








where 6 is the distance between consecutive observations. In the 
general case no assumption needs to be made about the symmetry of 
the weights. As a rule, however, it will be found convenient to con- 
sider symmetric weights, that is, weights such that 0;=,,-;+1. 

The use of such linear operations in the study of time series is, of 

1T am speaking here of the classical form of periodogram analysis. Professor Wiener of the Massa- 
chusetts Institute of Technology has told me that he has recently developed a generalized periodogram 
analysis (Acta Mathematica, 1930) which will overcome these difficulties. I have not yet had an 
opportunity to study Wiener’s method carefully, but I have some doubt as to the possibility of actually 


tracing the historical course of each component by constructing a periodogram. I shall revert to this 
question in my forthcoming monograph on the decomposition of empirical series. 









Proceedings 77 


course, in itself not new. I believe, however, that the systematic 
way in which I try to utilize them is somewhat novel. I try, for 
instance, to connect these linear operations with the very definition 
of the notion of a “‘component”’ in a given time series. If no assump- 
tion whatsoever is made regarding the nature of the components, then 
the decomposition problem has no sense. In this case we may pos- 
tulate the existence of n perfectly arbitrary “components” and make 
this fit in with the given series, simply by determining an (n+1)th 
component equal to the deviation of the given series from the cumula- 
ted effect of the n postulated components. The assumptions by which 
I give a meaning to the notion of “component” are built on the 
idea that each component shall represent something oscillating around, 
or departing for good, from a point of equilibrium. By formulating this 
assumption in terms of the approximate effect which a linear opera- 
tion will have on such a component, certain rules for the use of these 
linear operations in order to determine the components are developed. 

One of the problems studied in this connection is the general ampli- 
fication problem. I study certain types of operations 2 that will 
knock out progressive components with a small convexity, and other 
operations that will knock out small wave-like components, and at 
the same time amplify cyclical components with periods falling within 
a certain more or less definitely defined range. In many cases, this 
process is already quite sufficient to isolate a given sort of cyclical 
component and exhibit its historical course. In other cases there may 
be two cycles that are too similar with respect to wave length to make 
it possible to isolate them one at a time by such a simple amplification. 
In this case the problem may be attacked by certain propositions ob- 
tained from the study of the effect on the composite graph produced 
by “twin cycles.”” Or the problem may be attacked by using a cer- 
tain sort of linear operation which we may call the key operations. 
These are certain simple linear operations that are iterated a number 
of times and from the results of which is formed a certain algebraic 
equation, the key equation, whose roots will give information about 
the length of the periods in the case of cycles or of the yearly progres- 
sivity in case of non-cyclical trends. From the knowledge of these 
characteristics the conclusion as to the other characteristics of the 
components is easy. This analysis may be carried through either on a 
local basis, determining the characteristics of the components sepa- 
rately in each point of time and thus obtaining a moving determina- 
tion of the components, or it may be carried through on the basis of 
the total properties of the curve. The key equation procedure may 
also be extended to the simultaneous determination of three or more 
components. 














78 American Statistical Association 





The first to utilize the idea of a key equation seem to be Fr. Kiihnen 
and H. Bruns.'' The key equation processes of these earlier writers, 
however, were not built on the notion of a general linear operation, 
but on a particular form of such operations, namely, either successive 
differences or successive point selections. Therefore their theory 
could not be based on a systematic manipulation of the key opera- 
tions and of the amplification operations so as to have the equation 
work under the most favorable circumstances, which is the essence 
of my approach. This, I believe, explains why their method never 
had the success of being adopted in practical work to any large extent. 

One aspect of the problem to which I attach great importance is 
what might be called the Slutsky effect. This is the fact that linear op- 
erations applied to a random variable may produce fluctuations of a 
more or less cyclical character. I study the laws of such spurious 
cycle corrections and show in particular that it is possible to con- 
struct operations 2 which, when applied to a random variable, will 
produce nearly rigorous sine curves, the period and amplitude (but 
not the phase) of which can be predicted almost exactly when the op- 
eration 2 is known. I call this the Slutsky effect because I believe the 
Russian statistician, Eugen Slutsky, was the first to take up this 
effect in a systematic study. My results go somewhat further than 
Slutsky’s. In particular, I derive coefficients by which the amplitude 
of observed cycles are compared with the amplitude of spurious 
cycles created through the Slutsky effect. The knowledge of the laws 
of spurious cycle creation ‘thus obtained may be combined with the 
key equation approach in a manner which goes a long way toward 
eliminating the spurious cycles. The procedure simply consists in, 
so to speak, setting aside one root of the key equation to take up the 
spurious effect. This has proved rather effective, particularly if the 
key equation is formed on a local basis. 

If the cyclical components in the series studied are not too similar 
with respect to wave length, certain phases of the procedure here 
indicated may be worked out graphically, utilizing the inflection 
points of the given curve as it appears after a graphical smoothing. 
In this way, a person trained in graphical smoothing may perform 4 
very rapid rough analysis of what the given curve contains. I have 
received information telling that this graphical process in the form 
explained in my first paper has been applied also in other quarters 
with satisfactory results. 


iFr. Kihnen, Astronomische Nachrichten, 1909; H. Bruns, ibid., 1911. Later the idea has been used 
by J. I. Craig, Monthly Notices of the Royal Astronomical Society, 1916, and G. Y. Yule, Philosophical 
Transactions of the Royal Society of London, 1927. 
































(GAY ia 





Proceedings 79 


RECENT IMPROVEMENTS IN STATISTICAL INFERENCE 


By Haroip Hore.iine 


The great extension in the use of statistics in the last two decades has 
been associated with and largely made possible by mathematical devel- 
opments based upon the theory of probability. Of course, a great deal 
of what is called ‘‘Statistics’’ consists of records of numerical observa- 
tions which apparently involve no mathematics more complicated than 
addition, or at the most, division. When, however, we pass from facts 
to inferences, which supply, after all, the real motives for gathering the 
facts, we speedily encounter questions whose answers tax the resources 
of modern mathematical research, and in many cases call insistently for 
the solution of purely mathematical problems which no one has yet 
been able to solve. Recent developments in the theory of statistics, 
while providing satisfactory answers to a great many old questions, 
have raised a host of new ones—many of them of great practical impor- 
tance—which have caught the mathematicians unprepared to deal with 
them. 

A great diffusion of statistical methods into other fields of thought 
than biometrics and errors of observation began in earnest with the 
present century. The Statistical Mechanics of J. Willard Gibbs was 
published in 1902, and is today being followed by extremely important 
developments in sub-atomic physics and in chemistry. At the same 
time English investigators such as Hooker and Yule were applying the 
new method of correlation to the comparison of rainfall with crop yield, 
of the marriage rate with trade, and to other subjects of wide interest. 
Economic and social research has come more and more to bristle with 
correlation coefficients and other appurtenances of modern statistical 
theory. Psychology found the study of individual differences possible 
on a large scale when these new tools became available. The efforts of 
Spearman and Kelley to probe the hidden structure of the human mind 
by means of ‘“‘tetrad differences”’ have raised difficult mathematical 
problems which have been partly solved by these writers, by John 
Wishart, by Karl Pearson and others, but which are still in a state in 
which certain purely mathematical discoveries would contribute sub- 
stantially and directly to the applied study. 

A new era in the theory of statistics began in 1915 with the publica- 
tion in Biometrika by R. A. Fisher of a mathematical study of the dis- 
tribution in random samples of the correlation coefficient. For the 
practical use of a correlation coefficient, or any other function of ob- 
























ae 


=z. 












American Statistical Association 





80 


servations, it is absolutely essential to have some idea of the sampling 
distribution in order to judge whether an observed apparent relation is 
real or is due only to chance. A rough idea of a sampling distribution is 
obtained when we know the probable error, or standard error, provided 
the distribution is approximately of the normal or “‘Gaussian”’ type, as 
is often the case. The formula (1—7r*)/+/n had been obtained by Pear- 
son and Filon in 1899 as an approximation to the standard error of the 
correlation coefficient, r. To many social and biological investigators 
this old formula is still the ultimate test of the significance of the correla- 
tion coefficient. However, since r is necessarily confined between the 
limits —1 and +1, its distribution can never be exactly normal, and in 
many cases, particularly with small samples and high correlations, is 
removed very far indeed from the normal type. Fisher found the exact 
distribution. Later studies have provided convenient approximations 
for practical use. 

Fisher had been preceded by ‘“‘Student’”’ in the study of the distribu- 
tion of the correlation coefficient, and also in the investigation of the 
very important distribution of the ratio of a mean to the estimate of its 
standard error derived from the same observations. The problem of 
the significance of means had been dealt with by Gauss and others on 
the assumption that the standard deviation of the population sampled 
could be estimated exactly by means of the standard deviation of the 
sample. This assumption is good enough for extremely numerous 
samples, though how numerous the sample must be can only be deter- 
mined with the help of the exact distribution which was discovered by 
“Student.’”’ A knowledge of “Student’s”’ distribution makes possible 
the use of small samples whose significance could not otherwise be 
known. 

By 1925 Fisher had extended the application of ‘‘Student’s”’ distribu- 
tion to the difference between the means of two samples, and to testing 
the significance of coefficients of regression or trend lines found by the 
method of least squares. For example: it is possible by means of 
“Student’s” distribution to determine whether the apparently more 
rapid secular increase of one variable than another is significant, even 
though the series from which the rates of increase are calculated may be 
very short. These uses of “‘Student’s” distribution are now being 
brought to bear upon economic and agricultural problems by such 
investigators as Holbrook Working and Mordecai Ezekiel. 

The past decade has been particularly fruitful in contributions to 
theoretical statistics. The full utility of most of these discoveries 
must wait until they are more widely known. Besides Fisher and 
those associated with Karl Pearson’s Biometrical Laboratory, there 


















MO HEF 

























Proceedings 81 


have been many other contributors. The important work on small 
samples has recently been reviewed by P. R. Rider in the Annals of 
Mathematics. Passing over the contributions to mathematical proba- 
bility made by continental European and American writers, I shall speak 
only of two groups of investigations: those relating to time series, and 
those by R. A. Fisher. 

The application of the theory of probable inference to observations 
ordered in time leads to many perplexities. G.Udny Yule, in his presi- 
dential address to the Royal Statistical Society published in 1926, 
pointed out that a sixty-five year period showed a correlation higher 
than .95 between mortality rates and proportion of marriages cele- 
brated in the Church of England. The probability of so high a correla- 
tion, computed in the usual way, seemed to be one in many billions. 
The paradox disappeared when Mr. Yule showed a graph revealing a 
marked trend in both variables with only slight oscillations. The 
lesson is that trend must be carefully eliminated from time series. 
Several methods have been proposed. Perhaps the best method is to 
fit the data with a series of orthogonal polynomials, using in turn those 
of higher and higher degree. Each term can be tested for significance 
before proceeding; and since the functions are orthogonal, the succes- 
sive coefficients are independent, and there is a very great saving in 
calculation. Formulae for variances and correlations in such cases 
make clear just how much of the effect is due to each term of the trend 
equation, and how much to the deviations from trend. In addition to 
trend, “serial correlations’’ between successive observations give rise to 
fallacies, uncertainties and interesting problems. 

For time series displaying seasonal variation the median link relative 
method is in common use. This method has serious defects. It has 
arbitrary elements; it is not amenable to tests of significance; and when 
there is a progressive change in seasonal distribution it breaksdown. A 
method which avoids all these criticisms is illustrated in a study of 
British birth, marriage and death rates, in which the data are given by 
quarters. By fitting four trend curves, one for each quarter, and tak- 
ing deviations, I eliminate seasonal and secular elements simultane- 
ously. The coefficients of the orthogonal functions in the trend equa- 
tions summarize conveniently such information as the data supply on 
the changes of the total and of its distribution through the year; the 
deviations from trend are almost entirely free from these elements. 
For short series of years and weekly or monthly data, rough methods 
may still be most useful; but in this seventy-eight-year study of British 
vital statistics the link relative method was found decidedly inferior to 
that of quarterly trends. 





82 American Statistical Association 


Problems of time-series correlation with varying lag, the lag being esti- 
mated by means of the maximum correlation, have never had adequate 
mathematical consideration. Economic statisticians are therefore 
wise in maintaining for the time being a skeptical attitude toward these 
maximum correlations. One method which has been employed to 
study lagging effects in time series begins by forming a pair of seasonal 
indices, giving, for example, typical percentages of employment and 
numbers of marriages for each of the twelve months. These indices are 
then correlated twelve times, using all the possible lags. The true lag 
is supposed to correspond to the maximum correlation if this is large 
enough. Howlarge this maximum correlation must be to possess signifi- 
cance is not known; but the problem would be easy if we had the solu- 
tion of the following problem in geometry: In ten-dimensional space are 
drawn what correspond to a regular tetrahedron and a sphere with the 
same center. For every possible radius of the sphere, what proportion 
of its area is outside the tetrahedron? 

The great value of “Student’s”’ method of dealing with means con- 
sists in eliminating the assumption of a particular value for the un- 
known true variance. The distribution which he found involves no un- 
known parameters. This advantage attaches also to the ratio of two 
standard deviations, but not to their difference. Indeed, the calcula- 
tion of the probable error of a standard deviation now appears to be an 
inefficient procedure, since ¢ does not have a normal distribution ex- 
cept in the limit for very largesamples. The exact distributions needed 
were discovered by Fisher. ‘Tables of it are given under the name of 
“The Analysis of Variance”’ in his book, Statistical Methods for Research 
Workers. (The third edition of this book contains a list of his pub- 
lished writings.) He illustrates the analysis of variance by an agricul- 
tural plot-yield experiment. The total variance of yield among the 
plots is analyzed into a part due to the differences among the strains of 
wheat used in different plots, a part due to different treatment of plots, 
and a residual part due to soil heterogeneity and other causes. The 
errors likely to arise in these apportionments can be estimated by 
means of the newly discovered distribution. 

The correlation ratio has a distribution which Fisher derives from 
that of the ratio of two variances, and which I have obtained in a quite 
different manner by means of the geometry of the n-dimensional 
sphere. The probable error of the correlation ratio may now be re- 
garded as an obsolete concept; the assumption of a normal distribution 
is in this case an extremely crude approximation, and is unnecessary 
since the discovery of the exact distribution. 

The distribution of the partial correlation coefficient is identical with 





Proceedings 83 


that of the simple correlation coefficient; but the elimination of each 
variable is equivalent to reducing by unity the number of pairs corre- 
lated. The distribution of the multiple correlation coefficient is an- 
other matter; Fisher discovered it first for the simple case of an un- 
correlated population, and later for the more general case. When a 
multiple correlation coefficient is calculated from a sample, limits can 
now be found between which the true multiple correlation may with 
specified reliability be said to lie. Prior to 1929 this was not possible. 

A spirited controversy took place about 1923 regarding the use of x? 
in testing homogeneity or independence by means of contingency 
tables, which represent distributions of observed frequencies among 
cells arranged by a double classification in rows and columns. To de- 
termine whether one mode of classification is independent of the other 
we may construct a table having the same marginal totals, but with 
perfect proportionality in the entries. This artificial table represents 
ideally perfect independence and homogeneity. We calculate the 
measure of goodness of fit of the observed to the theoretical table. The 
difference of opinion concerned the number of degrees of freedom to be 
allowed for in interpreting the result in terms of probability. It now 
appears, at least to many of us, that Fisher was right in holding that the 
correct number is the maximum number of cells which can be filled in 
arbitrarily without conflicting with the marginal totals. 

In the same way it appears that in using x? to test the goodness of fit 
of frequency curves, the number of disposable constants should be 
deducted from the number of classes to obtain the value of n with which 
to enter the probability table. Even this correction may not give 
exactly the correct probability, for various reasons. One of these is the 
effect on the x? distribution of small frequencies, which has been 
studied by B. H. Camp. 

Incidentally, the calculation of tetrachoric or biserial r is not the 
proper way of testing whether a contingency table exhibits independ- 
ence. These methods are appropriate only to the more specialized 
problem of estimating the amount of correlation for a normal surface. 

The search for cycles by a systematic method of periodogram analysis 
probably began a little before 1900 with Arthur Schuster’s Terrestrial 
Magnetism. There have been modifications of Schuster’s method, but 
all involve picking out a period corresponding to the maximum “inten- 
sity” which can be calculated from the observations. The chance of 
this maximum being spuriously large was investigated by Sir Gilbert 
Walker, who deals with weather cycles. His approximate solution was 
last year corrected by Fisher in a manner analogous to “Student’s” 
correction of the assumption of the normal distribution of the ratio of a 





84 American Statistical Association 


mean to its standard error. Other periodogram methods have been 
introduced by Whittaker and Robinson, by Dinsmore Alter and by 
others. 

These distributions of simple, partial and multiple correlations, of 
“Student’s”’ ratio, of variances and ratios of variances, of correlation 
ratios and periodogram intensities, have all been derived on the assump- 
tion of sampling from normally distributed populations. But usually 
we do not know definitely whether our populations are normally dis- 
tributed or not. There is some reason to think that the distributions 
which have been derived are good approximations to the truth, at least 
with moderately large samples, even if the population sampled is some- 
what removed from normality. However, this matter requires far more 
mathematical attention than it has yet received. The problems are 
difficult; but the purely mathematical studies can be supplemented by 
experiments with the apparatus of games of chance. Such experiments 
have, in fact, served more than once to advance the mathematics of 
probability in the past. Contributions to this subject have recently 
been made by W. A. Shewhart and F. W. Winters, by Paul R. Rider 
and by Egon S. Pearson. J. O. Irwin and others have studied dis- 
tributions related to Pearsonian curves. 

Though the study of exact distributions derived from non-normal 
populations appears in these cases extremely difficult, the same is not 
true of the problem of obtaining the first few moments (in the probabil- 
ity sense) of the first few calculated moments of observations derived 
from such populations. Many such problems have been solved; but in 
going on to the higher moments the solutions become enormously cum- 
bersome, and the chances of error excessive. To obviate these difficul- 
ties ingenious devices are needed. Semi-invariants were used in such 
sampling problems in papers published independently by Fisher and 
by C. C. Craig. 

Progress has recently been made also in matters of great philosophical 
interest. The very foundations of the logic of statistical inference have 
been attacked, and new bases proposed. More and more it has become 
clear that, in attempting to measure degrees of rational belief, we must 
use yardsticks different from the traditional scale of mathematical 
probability, and incommensurable with it. The notion that the prob- 
abilities of all uncertain propositions could, if only we were clever 
enough, be expressed in linear order as numbers between zero and one, 
is one of the great phantoms which, like the Holy Roman Empire, 
have dominated men’s minds for centuries and obstructed the rise of 
better systems. Nowadays it is coming to be realized that mathemati- 
cal probability as defined in the books is sufficient only for what may be 





Proceedings 85 


called ‘‘deductive probable inference,” such as the prediction of the 
limits within which the number of white balls in a sample from an urn of 
known composition may be expected to fall. Inductive inference, the 
estimation of a population with the help of a sample, is a different mat- 
ter. The degree of confidence which may reasonably be placed in an 
inductive inference has no natural measure on the scale of deductive 
probability, but requires for its apprehension an entirely different sys- 
tem of measurement. To set up a correspondence between these two 
distinct scales of rational belief is possible only on the basis of arbitrary 
assumptions having no real meaning. If we know that a coin is un- 
biased, and successive tosses independent, we can, in accordance with 
long established principles, say that the probability is approximately 
20/21 that in one hundred throws the number of heads will be between 
forty and sixty. However, if all we know is the number of heads and 
the number of tails in an actual series of tosses, it is impossible to derive 
from any of the usual definitions of probability alone a numerical value 
for the probability that in a further series of tosses the proportion of 
heads would be between 40 per cent and 60 per cent. 

This does not mean that experience has no measurable effect upon 
the credibility to be attached to hypotheses regarding the conditions 
from which the experience is derived. It only means that the tradi- 
tional apparatus of mathematical probability is not, by itself, sufficient 
for the purpose. At least two new measures of belief are available, 
both related to mathematical probability, but not identical with it. 
One of these is called by Fisher “likelihood,” and is appropriate for 
comparing any two specific hypotheses as to the nature of the popula- 
tion—that is, of the causal matrix—from which the observed sample 
arose. The relative likelihood of the two hypotheses is the ratio of the 
probabilities calculated in these two ways of obtaining exactly the ob- 
servations which have been obtained. Thus, if we are inquiring as to 
the true value of a quantity of which we have a succession of measure- 
ments derivable under conditions consistent with the normal error 
curve, we have an infinite number of hypotheses as to this true value. 
Any two of these hypotheses can be compared by the method of likeli- 
hood. One particular value of the unknown quantity will correspond 
to a greater likelihood than any other, and if we are to adopt some one 
figure, this value corresponding to the maximum likelihood seems the 
most reasonable estimate of the true value. 

The other new measure of credibility arises in connection with the 
question whether an estimate, such as a mean or correlation coefficient 
derived from a sample, deviates significantly from a theoretical value, 
for example from zero, or from a similar estimate derived from another 





86 American Statistical Association 


sample. In such cases it seems reasonable to adopt a policy of re, _:ting 
those hypotheses for which the probability P of a greater discrepancy, 
falls below some fixed small value, such as .05. If we insist upon a 
higher standard of reliability in our results we may set this probability 
as .02, .01, oreven less. However, if we adopt too drastic a standard of 
significance, we shall find it difficult to attain, and shall have to discard 
a large part of the results of our scientific labors. The degree of small- 
ness of P then provides, in a sense, a measure of the reasonableness of 
the hypothesis alternative to that on which the probability P was cal- 
culated, when only one such alternative can be considered. It is not 
the probability of this alternative in the old sense; but it has the ad- 
vantage of depending only on the observations, and not at all on any 
vague a priori probabilities. 

The criteria of likelihood and of rejection of hypotheses on the basis 
of the probability of a greater discrepancy have been discussed and 
extended by Egon 8S. Pearson and J. 8. Neyman. 

As the amount of evidence or the size of the sample increases, the 
credibility of a hypothesis regarding the true value of a quantity esti- 
mated sometimes increases and sometimes decreases. However, there 
is something which alwaysincreases. This may be called the amount of 
information. Fisher proposes to measure it by the second derivative 
with respect to the unknown quantity of the logarithm of the likeli- 
hood. This measure has the advantage of being proportional to the 
number in the sample. It is inversely proportional to the variance, 
and, therefore, to the square of the probable error of estimates made by 
the method of maximum likelihood. Fisher is also responsible for the 
notion of “efficiency” in estimation, which may be described as the 
ratio of the amount of information extracted from the sample to the 
amount in the sample. For example, it can be shown that in estimat- 
ing the center of a normal distribution the mean of the sample has a 
smaller probable error than any other estimate. Its efficiency is 100 
percent. The median, which is sometimes recommended, has so much 
larger a probable error that its efficiency is less than 64 per cent. This 
means that the arithmetic mean of sixty-four experiments or observa- 
tions is approximately as good an estimate as the median of one hun- 
dred such observations. In the same way the standard deviation pro- 
vides a better estimate than any other of the dispersion of the normal 
distribution. The mean deviation has an efficiency of 88 per cent. 
Hence the use of the mean deviation, though slightly economical of 
labor in calculation, requires one hundred cases where eighty-eight 
would suffice if the standard deviation were used. For non-normal 
distributions, however, these percentages are not valid, and for particu- 











C0! 
fur 
tal 
of 

of 

mi 
sta 
tio: 


che 


of ¢ 
of § 











Proceedings 87 


lar udfributions the median, or mean deviation, has 100 per cent 
efficiéncy. 

For'estimating the values of the parameters which appear in empiri- 
cal equations the’ method of maximum likelihood is to be recommended, 
at least in a large class of cases, as having 100 per cent efficiency. The 
Pearsonian system of frequency curves is usually fitted by the method 
of ntoments, ‘whose efficiency has been shown by Fisher to be in many 
cases very small. However, the method of maximum likelihood often 
leads to involved mathematical calculations. The director of research 
must in such cases decide whether to expend his resources in obtaining 
a large number of cases and treating them by an inefficient mathemati- 
cal method to save calculation, or whether it is more economical to ap- 
ply a more efficient method to a’smaller number of cases to obtain the 
same degree of accuracy in estimation. This involves a comparison of 
the cost of gathering data with the cost of calculation. 

/ ‘urther difficulty with the application of the method of maximum 
likehhood to empirical frequency curves rests upon the inadequate de- 
velopment which the mathematics of the subject has at present at- 
tained. For small samples from non-normal distributions, tests of 
significance are known in very few cases. This obstacle may, however, 
prove to be only temporary, as mathematical research is now under 
way, in several parts of the world, regarding the exact distributions of 
estimates by the method of maximum likelihood. 


DISCUSSION 


By Watter A. SHEWHART 


Scientific method in its older form took as one of its cornerstones the 
concept of causal relatedness. All of us, however, have lived to see a 
fundamental change in this cornerstone. Today it is being modified to 
take account of the fact that we no longer believe in the “exact’”’ nature 
of physical properties and physical laws. Instead of the older concept 
of exact causal relatedness, we now have the concept of causal indeter- 
minateness—a natural outgrowth of our present concept of the inherent 
statistical nature of the phenomena which we observe and of the rela- 
tionships between these phenomena. To take account of this radical 
* change in concept, it has been necessary to bring about radical changes 
in the older methods of inductive and deductive inference. The logic 
of discovery is beginning to take on a new form, and today, the subject 
of statistical inference is of interest to all scientists. 





88 American Statistical Association 


Improvements in statistical inference have been made in so many 
different fields and by so many different people that it would be impos- 
sible to give a complete summary of all of this work in the compass of a 
brief paper. Professor Hotelling has made for us a very interesting 
selection of material. I think that there may be some basis for ques- 
tioning the ‘‘newness”’ of some of the things which he records as being 
new. Furthermore, it is natural that there should be some difference of 
opinion as to the material that should be included in a paper of this 
kind. 

It appears that many of the improvements referred to come under 
two general heads: (a) the theory of distribution; (b) the theory of 
estimation. Professor Hotelling justly emphasizes the significance of 
recently obtained exact distribution functions for several of the im- 
portant statistics. I would like to point out in this connection the im- 
portance of recent developments in deriving the expected values and 
variances of statistics even where the exact nature of the distribution is 
not known. In this connection the Russian school also has made con- 
tributions. In the practical field we often are able to make use of this 
kind of information through the help of Tchebycheff’s inequality and 
recent improvements thereof by Camp, Meidell and others. In fact, 
these inequalities serve in a manner quite analogous to that of the re- 
mainder term in an infinite expansion, making it possible for us to esti- 
mate the maximum error that we may make in choosing the wrong 
hypothesis as to the nature of the distribution function. Recent im- 
provements in the theory of distribution of the range made by L. H. C. 
Tippett, J. O. Irwin, R. A. Fisher, E. L. Dodd and others have proved 
to be of considerable value in our work. 

In considering the contributions under the theory of estimation, we 
are treading upon somewhat uncertain ground, at least from a practical 
viewpoint. In fact, there are many who prefer the a posteriori method 
to that of the method of maximum likelihood discussed by Professor 
Hotelling. In this connection, I think that a recent contribution of 
Molina and Wilkinson on the subject of small samples is of considerable 
interest. I feel that the recent articles of Egon Pearson and J. Neyman 
discussing these two methods should receive more emphasis than is 
given to them in the paper now under discussion. This, however, is a 
matter of personal opinion. 

As a matter of fact, as I have pointed out elsewhere, an engineer finds 
that he cannot make use of the customary theory of estimation until he 
has succeeded in eliminating what we have chosen to call assignable 
causes of variation. There is, however, in all of this work of eliminat- 
ing assignable causes, the use of statistical inference in a slightly differ- 





a a ee ee ae ee 





Proceedings 89 


ent way from that described in the articles referred to by Professor 
Hotelling. 

In fact, taking a broad viewpoint of the improvements in statistical 
inference, it seems that recent contributions in the field of the logic of 
discovery by men such as Whitehead, Carmichael, Dubs, J. M. Keynes, 
W. E. Johnson, Broad, Ritchie and Bertrand Russell, and in the field of 
physics by men such as Eddington, Heisenberg, Dirac, DeBroglie and 
Jeans, to name only a few, should come in for brief mention, even 
though it is impossible to do more than that. At least we have found 
the contributions in these two fields to be of great value in the establish- 
ment of a basis for the economic control of the quality of manufactured 
products that will take account of the modern concepts of the statistical 
nature of physical properties and physical laws, and of causal indeter- 
minateness. 




















American Statistical Association 





90 


WELFARE AND INSTITUTIONAL STATISTICS IN THE 
UNITED STATES 


By Horatio M. Potiock 


In the states in which welfare work is well organized, nearly one- 
fifth of the entire population receive some aid each year. It would 
naturally be expected in such comprehensive work dealing with 
human well-being that careful records would be kept and results would 
be measured, analyzed and fully reported year by year; but, strange to 
say, welfare workers seem to have peculiar antipathy toward good 
records and statistics. Inquiry reveals deplorable inadequacy in 
records and reports in the welfare field in general. In the main this is 
not due to lack of effort or interest. If the energy now being put on 
paper work by the various agencies were well directed, it would 
undoubtedly accomplish its purpose. At present much of the work 
done on records and reports is worse than useless. Forms are filled 
out and tables are prepared and printed with apparently no thought 
of their value or purpose. 

Noteworthy progress, however, is being made in several separate 
fields. A vast amount of strenuous work has been done in recent years 
for the establishment of better systems of records and statistics. In 
the institution group, mention should be made of the persistent 
effort of the National Committee for Mental Hygiene in coéperation 
with the American Psychiatric Association, the American Association 
for the Study of the Feeble-minded and the National Association for 
the Study of Epilepsy, for the standardization and compilation of 
uniform statistics of the insane, feeble-minded and epileptic. Much 
credit is likewise due to the American Prison Association and the 
American Institute of Criminal Law and Criminology for their efforts 
on behalf of better prison statistics. 

In the extra-mural group recent noteworthy contributions include 
the following: The work of the Federal Children’s Bureau in the stand- 
ardization and collection of juvenile court statistics; the work of the 
Committee on Registration of Social Statistics in the collection of 
social welfare data from city organizations; the efforts of the National 
Probation Association in standardizing statistics of probation; the 
unique work of the Commonwealth Fund in preparing a manual for 
the recording and compilation of data in child guidance clinics; the 
service rendered by the Russell Sage Foundation in standardizing 
records of outdoor relief; and the exceptionally thorough work of the 








— > = - - Ms La ee 


— 


Qo. & 














Proceedings 91 


International Association of Chiefs of Police in the study and or- 
ganization of police statistics. 

Through the coéperation of state and Federal departments, many of 
the gains outlined have become permanent. The Federal Census 
Bureau is now collecting annual statistics of state institutions for the 
insane, mental defectives, epileptics and prisoners. The Federal 
Children’s Bureau has taken over the collection of welfare data from 
city organizations; and the Bureau of Prisons in the Federal Depart- 
ment of Justice is continuing the work of collection of statistics of 
crime in cities begun by the International Association of Chiefs of 
Police. 

Although the status of welfare statistics in this country is better 
now than at any previous time, we are far from the desired goal. We 
are still unable to answer many fundamental questions relating to 
dependency, delinquency, disease, or mental or physical deficiencies. 
We still lack data to enable us to evaluate the work of private and 
governmental agencies in the welfare field. We have few data to in- 
dicate whether we are gaining or losing ground in our efforts to reduce 
poverty, to prevent venereal disease and to check crime. Who is 
there among us who can plot the curve of dependency for the United 
States as a whole or for any state? Can any of us without prolonged 
research give a reliable estimate of the success of the juvenile court 
or of the Baumes Laws or other measures for the prevention of de- 
linquency? Who can tell the expectancy of dependency or of crimi- 
nality in any state of the Union? How many state welfare depart- 
ments know accurately the size and cost of the welfare problem in 
their own state? How many can tell even approximately the num- 
ber of persons who received aid from public funds in a single year? 

At present our public records of automobiles are much more com- 
plete than are those of men and women. The recent studies of 
Dublin and Lotka on the Money Value of a Man have shown that 
a man is worth more than an automobile. These studies have also 
thrown new light on the economics of social work and on the impor- 
tance of more accurate accounting of all factors involved in human 
welfare. If, from an economic standpoint, the value of the human 
beings constituting the population of the country is five times as great 
as that of all material wealth in the country, it seems but reasonable 
that accurate detailed records should be kept of the members of the 
human family constituting our population. The intellectual and 
spiritual values of men and women far transcend their economic value, 
but the average American has formed the habit of estimating values 
in dollars and cents and of organizing social and governmental agen- 








92 American Statistical Association 


cies in accordance with such estimates. Considering all human values 
and the great need for better social relations and social control, it 
seems imperative that the principal events in the life of every indi- 
vidual should be systematically recorded. If such records were kept 
of each person it would be comparatively easy to compile adequate 
annual statistics therefrom. 

It is accordingly proposed that provision be made for the annual 
registration of all individuals constituting the population of the 
country. For each individual there should be a family record, a health 
record, a school record, an occupation record, an achievement record, 
a social relations record, etc. These should be permanent records and 
should be accessible for all legitimate purposes, including scientific 
study. 

It is probable that such registration would be best effected by the 
expansion of present bureaus of vital statistics in cities and villages. 
The records should, of course, be standardized and be made uniform 
in scope throughout the country. 

The social advantages to be derived from such records would be 
beyond measure. A complete annual census of the population of each 
community would be made available. Various groups, such as the 
mentally defective, the blind, the deaf, the dependent, the anti- 
social, etc., could be segregated and given appropriate relief or super- 
vision. ‘The data made available could be used as a guide in planning 
welfare work and in efforts for the prevention of dependency and 
delinquency. Family records would tend to promote eugenic mar- 
riages and race improvement. Occupation records would serve both 
employers and employees. Social relations records would be of great 
value to police departments and courts. 

To the individual, from youth to old age, the registration records 
would give a strong incentive to right living. The old admonition to 
the young man, “ Make clean your record, ”’ would no longer be a vague 
generality but a definite challenge so to live that his record on the 
books of his community might stand before his contemporaries as a 
creditable testimonial and might be passed on unblemished to his 


posterity. 
DISCUSSION 


By Bennet MEap 


I should like to congratulate Dr. Pollock on his courage, independ- 
ence and vision in making this very constructive proposal for a com- 
plete annual registration of the entire population. I am sure that Dr. 








Proceedings 93 


Pollock has in mind that such a registration would be of use not only 
for administrative purposes, but also as a basis for compiling accurate 
annual statistics of the population, so that we would no longer be com- 
pelled to rely chiefly on the Census statistics, which are compiled at 
intervals of ten years. This proposal may seem to many of us at the 
present time to be impractical. Certainly, to develop a registration 
system may require much hard work for perhaps many years. But the 
need for more up-to-date population statistics should be fully appre- 
ciated by those of us who have tried to interpret the statistics of special 
classes, such as prisoners and the mentally handicapped, and have 
found ourselves seriously handicapped by the lack of comparable 
information concerning the general population. During recent years, 
for example, we have developed annual statistics involving considerable 
detailed information as to the sex, race, age, etc., of certain institutional 
groups. But we find ourselves (except at the time of the decennial 
census) without any satisfactory population data from which we can 
determine the significance of our institutional statistics. If the sug- 
gestion of Dr. Pollock could be carried out, it would remedy this 
situation. 

When we seriously undertake to develop more adequate nation-wide 
institutional and welfare statistics, certain administrative problems will 
require attention. One such problem relates to the selection of the 
agency, or agencies, which are to be charged with the collection of our 
institutional and welfare statistics. For example, should we, on the 
one hand, intrust the task to a general statistical agency, such as the 
Bureau of the Census, or should we, on the other hand, place the work 
under some bureau or bureaus whose administrative functions are 
connected with the institutional statistics? 

We have at the present time a somewhat acute example of this 
problem in the situation with reference to our statistics of crime and 
its treatment. The Bureau of the Census has been collecting for 
several years, on an annual basis, certain statistics of prisoners. The 
Bureau of Investigation in the Department of Justice has recently 
begun the collection of certain criminal statistics from police agencies 
throughout the country. The Federal Bureau of Prisons, in the 
Department of Justice, collects statistics of all Federal prisoners, in- 
cluding those in the jails as well as those in the Federal institutions. 
Furthermore, the Federal Department of Justice collects extensive 
statistics concerning cases handled in the Federal Courts, and the 
Prohibition Bureau, as well as numerous other bureaus charged with 
enforcing specific laws, compiles statistics of its own activities. Fi- 
nally, the Children’s Bureau, in the Department of Labor, collects 





94 American Statistical Association 


annual statistics of juvenile courts; and since last June has been col- 
lecting monthly data concerning probation work in a selected group of 
cities. 

It is apparent, therefore, that insofar as we have anything approach- 
ing nation-wide statistics of crime and delinquency, they are being 
collected by a large number of Federal agencies. It is also apparent 
that this condition of affairs is unsatisfactory from the standpoint of 
getting a proper codérdination of the various types of criminal statistics. 
While we may find it necessary, at least for a time, to continue the 
separate collection of these statistics by different government agencies, 
it is very important to bring about the fullest possible proper coérdina- 
tion of the statistics through coédperation between the respective 
agencies; and to plan also to concentrate the collection work under 
fewer agencies. 

It is to be hoped that our nation-wide institutional and welfare 
statistics may be developed more rapidly than has been the case with 
the registration areas for vital statistics. The nation-wide annual 
collection of mortality statistics was begun in 1900. Now, after thirty 
years, the registration area still does not include quite all of the states. 

An important reason for this slow growth in the registration areas is 
the fact that the Federal Government has acted, in the main, simply as 
a recording agency. A much more rapid growth of our welfare 
statistics could be expected if the statistics were collected incidentally 
in the process of developing a far-reaching program of Federal financial 
aid to the states for more adequate welfare work. We have seen 
during recent years a striking example of the possibilities of such a 
development, in the maternity and infant welfare work developed by 
the Children’s Bureau under the Sheppard-Towner Bill. This work 
not only was tremendously successful in improving the organizations 
and standards of care in the coéperating states, but also, as a by- 
product, has furnished us with a remarkable body of statistical data 
concerning child health activities. Furthermore, this work greatly 
accelerated the growth of the birth registration area, and was largely 
responsible for the fact that in about fifteen years this area has grown 
to cover about the same area which has been attained by the mortality 
statistics area after thirty years. 

The time has come when we should recognize that the Federal 
Government ought to supplement generously the efforts of the states in 
promoting the social welfare. Once this principle is accepted, there is 
no good reason why the state aid principle which has in the past been 
applied so effectively toward improving the public roads, and more 
recently toward promoting child welfare activities, should not be ap- 





n> mM © me 


— 





Proceedings 95 


plied to bring about better care of all the underprivileged, handicapped 
and maladjusted groups, regardless of age. It is important to collect 
social statistics. But it is still more important to set up effective na- 
tional machinery for dealing adequately with social problems. Social 
statistics can be most effectively developed, when assigned their proper 
function as a basis and a by-product of social service administration. 





96 American Statistical Association 


A STATISTICO-LEGAL STUDY OF THE DIVORCE PROBLEM 


By Leon C. MARSHALL 


I have been asked to present a “‘case study of the development of 
institutional statistics, using as the “‘case”’ an attempt now in process 
to improve our basic data in the field of divorce, as handled by the 
courts. 

The development of institutional statistics, it is a commonplace to 
observe, should be in terms of the purposes which are to be served by 
the data. It is convenient to think of the purposes of judicial divorce 
statistics as being mainly two: (1) provision of data and reports for the 
information and guidance of the officials of the judicial system; and (2) 
provision of data and reports for the information and guidance of the 
general public, including in this public not only the ordinary citizen 
but also legislators and students. 

It is obvious that a decision as to what extent institutional statistics 
should be elaborated calls for balance and judgment. As a practical 
matter, these statistics are ordinarily compiled by members of the staff 
of the institution concerned and should not be elaborated beyond the 
reach of the training of the personnel which will in practice be available 
to do the work. In particular, the exercise of judgment is called for 
in deciding how far to elaborate the data to be used in ‘‘social report- 
ing.”’ It is a good general principle that the personnel of the institu- 
tion should not be called upon to develop data for “social reporting” 
which would be of no use for purposes of internal administration, unless 
there are positive and convincing reasons for their taking on this 
burden. 

The most cursory examination of our statistics of divorce indicates 
that in their present form they fall far short of the criteria set up in the 
preceding paragraphs. 

Our most elaborate collections of divorce statistics are those pub- 
lished by the Federal Government. The first collection, made by the 
Commissioner of Labor, Carroll D. Wright, covered the period from 
January 1, 1867, to December 13, 1886. The second was made by the 
Bureau of the Census and covered the period from 1887 to 1906.' The 


1 United States Bureau of the Census, Marriage and Divorce, 1867-1906. (Results of two Federal in- 
vestigations. The first, made by the Department (now Bureau) of Labor under the direction of Carroll 
D. Wright, covered the period of 1867 to 1886. The second, made by the Bureau of the Census in ac- 
cordance with a joint resolution approved February 9, 1905, was conducted under the supervision of 
William C. Hunt, chief statistician for population, and covered the period 1887 to 1906.) 





oH ss & how o 


—— ~- ©, 


@ © 6 Bens & 



















Proceedings 97 


third is limited to the calendar year 1916;' and beginning with the 
calendar year 1922 there have been issued by the Bureau of the Census 
annual compilations of data covering marriage and divorce.” 

Even if there were time, this is not the occasion for a detailed analysis 
of these compilations of divorce statistics issued by the Federal Gov- 
ernment. What they cover in their current form is sufficiently re- 
vealed by an examination of the card used to collect the data. This 
card is sent to the clerks of court in all states except sixteen states from 
which the data are secured from some state office. It calls for the 
following information: the date of the marriage; whether the divorce 
was granted to the husband or to the wife; whether the case was con- 
tested; the date at which the divorce became effective; the cause (legal 
cause, of course) for which divorced; and the number of minor children 
affected. 

Not greatly dissimilar from the compilations issued by the Bureau of 
the Census are the data available in certain states which have estab- 
lished state registration of divorce. 

Another compilation of figures on divorce will be cited—the judicial 
statistics of Ohio; cited because, being rather above the average of 
similar compilations, they may be presented as an example of the better 
current practice. The tabulation covers for each county, classified 
according to each of the major legal causes of divorce, the following 
data: number of suits pending at the opening of the fiscal year, number 
filed within the year, number decided within the year, number pending 
at the end of the year. It shows how many suits were brought by the 
husband, and how many by the wife; how many were granted to each; 
how many were refused to each and dismissed; in how many cases ali- 
mony was granted to the wife; how many children were given to father, 
to mother, and to grandparents. 

If one thinks back over the foregoing brief description of typical 
divorce statistics of today, these observations are pertinent: (1) The 
data are of relatively small significance to the administrators of our 
judicial machine—are of little use as aids to administration; (2) they 
are also rather ineffective for purposes of social reporting. Although 
many interesting and valuable facts have been made available as to 
the amount of divorce and the formal or legal causes on which divorce 
actions are based and the minor children affected, the data are seri- 
ously lacking as to the economic, social and psychological backgrounds 
of the divorce problem; (3) the data are not gathered by the personnel 
of the judicial system as something which may be made to contribute 


1 United States Bureau of the Census, Marriage and Divorce, 1916. 
2 Ibid., 1922-. 




















oO} ZUeTIABC ADTIOU O]pOoyied jy ‘pF | Me 22 SNe Ui} owe (2 Ye) gqoeyup) a ee 


ccowonsoccccccoss t (perues3 31 ‘003 sous0yye opnio ae ane soneyv I — ee ra 3 v 
-xe urs su ‘BVUyey osep Bops}s0ds}1q jo “ — ° 4 
403 WWeuISpnf Aououl [47,01 “g 3° °DOK “TS ‘Os Wse1--TORwoHIGNd Aq” eojas0g “¢ 
10MIN “Ys YUBpUEJeG “g~ ----gt somo “g1--- | 97% 3° WMO ‘eojases pwuCSIEG “g-~- 
mom -2--~ BHU “Tt Sone Se eens tenes “st 





shatter A 3 % ne eurea jo ucsiwsc3eeN “LT~~~ came wos ou pe 43908} WoouD) 

y oD “er 19303 w 97344) ~~et £y104010 yo uopysodsiq ‘9t---| 2049 peamboy “uoporpsMe “os 

“SyUMOUTY “6L-FF nity pe “an 
ounce or Sst aouju jo j0ddns 4 oan "St peuojjuem jou 10 UMOUNUH “Gg 

@A0QB pe1eA00 300 ° si 
enn wanes ‘S “QO Ul Juepysez-u0ON “4 

“““-g— 4£j4d s0q}0 4q pred pesepso seez Aousoyy ‘gt™-| FF P CED AND TEES WE ‘$0 WE Wop 
oomnas eWBU JO UO;WVI0}s0x “LT~"- |“ “ST  DeApIFGo [Te Apoysno yuouwmseg “gT~~~ | “1891 3Nq OFGO Uy ee a 3 

“se Ay10d01d Jo wopsodsq “91~~~ | --~-gz AUOUTTe JueuvMIEg “ZI~~~ | 4UN0D 10q}0 eUIOS Uy WUep]sey “~~~ 

‘at SorpITe soups 70 W20GERS JO SuLEM “ST |... quewjuay ‘t1--- Ayano sq} UE WWepyseN “T-~ 


= WorpITGo emos Jo Apoysny “pT--~ ©7838 107}0 U} poos0Ayp (Arwo == {e — #9000) 

one, ee) ee a mno1s + {020A} ‘0T7~~ aepue 

et wospITH Ite Jo £poyeng “et | ~~“OF — = Wee Tedyoujsg Jo oouapysou “ET 

“et AUOUITY ‘ZT “""60 Uy JUeMIUOsTIdMy ‘punoslzd ‘e020AIC *60°~~ osi0d 

——_, quewjneay “TT GeUne CORD 205 CENEES edwsoomy sireset some 
03818 107130 “"""0 -MUnIp [eNzIQuy ‘puno1s ‘e01041q *g0-~~ IO OUBsUT ‘sOUTU JO (‘930 

“““-OT Uy peowoayp Ajred J0q}0 ‘punoid ‘eo10Ayq “OT | ___. ae ue qxou ‘aueypiuns 

Are}09} LO 30[30u se013 ‘punois ¢ LOT | --=g ¥ caneyucseates ns ya 
on -Jued uy JUCUIUOsTIdUy ‘punolZd ‘eo10Alq ‘607~~ : aque 
60 +t $Y pAdury "Pp Sant sean 60 “"“-99 £jJon10 ouIerxe ‘punols pre 90° |" "'T Uosied [VNpAypuUl “T 
“-"-99 103 SSOMUSAUNIAP [enz}qQvy ‘punois ‘eo10A1q “g07~~ | ___ oer} ... | 3a wld 
. wemnpn uno12 ‘eo10aiq ° 
--49 Lynp 90 wesheu “punos® e020 “197-- $0 quempnery ‘p Fa *90 (uwmoo yooo us yoo ouo 


. pRB i my FP pe Ba ay) 
---"99 £yens owmerjxe ‘puncss ‘eo10ajq “907-~ | 90 Aouezoduy ‘punosd ‘eo10atq *0~~~ | Sm1d 40) 3/01 ous wo [@ Ys1m] YooUD) 
“"""4Q OVIZNOD JUOTNpNeIZ ‘punois ‘eo10A}q “S0~~~ | ~~~"go 4£10}[Npe ‘punoid ‘eo10ajq ‘g07~~ 





sopvg [edpouyid JO snzwIg “ST ‘ZT 


wong ‘ ‘ n7o" srvek 403 ‘soM ~~~” “siy-""" yA}UNOO Uy 
__w Aouojodmy “PUnosd 29010410 “FO | ---.¢9 gouosqu Inyiim ‘pUnoIs ‘eokoaid “ZO--~| popiser BHUTVId seq Zuo] MOH “OF 
"80 4£190j[Npe ‘punoid ‘eo10a1q *g0 = ‘ Ls pape Oe 40740 UF 
“-"-29 gives O01} 10} eoUesqe ‘punol3 a. ww To arr ” ome TO” | peprees grayerd seq Suo] MOH “gt 
“"“-TQ peAlossspun snojaoid ‘puno1s ‘e010A1q ‘T0~~~ | -é0 940 ep exooyo Auow -d0 940 90 949040 Aupw | ON ‘Z-~~ sex T+ s49IN0 

ep 999 ‘hun fh “uonon eD 08) “u0}0D 1Dd~OUZsd om & od JO YAUpWR SUA “FT 
a (Asversoow #0 #40049 Muu ev oon trun | “#8040 wp gyOnoe fonos ous p gydnos joyos | LPOIG 4310400 JO AUPE SUM 
Pusjop posopues squswOpnt oy3 M0109 wuNI00 puny ry0ps uy [@ | YF by Sie 14 eel aiouoy Aue. J} ‘sopj20d 
42108] ¥99YD “AsDes000u ED BYORYO AuDW sD oe !Jyus0Id posopuaas | *XOM ¥? [© WEIO] HOOYD, "Lsnee Waea” 
% ous mopog wunico puoy s/o Mp Le uis0) Yoon) v pepeolduy yo Jequinu e311 ‘ST 

PeH JI ‘WuouBpne JO A9j,00IVYD ‘SF-OF WYSnog Joyoy JuouvULIVG *6Z-9G suopsond [wpedg 


yuupasjea 








American Statistical Association 











‘si, 
an" 





‘sin ~ 
(quppusjep pun J13up01d v3;20ddo yey) “In"** 
sfaui0yy osuQ jo of49g8 40 OFT L yoryas qooYD 


St-Or 6 ‘8 JO 74000 ORUQOIT *Z-~~ 
JO 24N0Q SvoL_ WouTMOD “~~~ *Z 


yoryas 
qo 














----- O€6I ‘TE Joquieseq 03 OEET ‘T An 
"ON APOC “OT ‘OFYOC JO S1INOD s7eqQoIY pue sv2]q UOUTUIOD UT jo pssodsiq 
sosed JuswWNUUY ‘AUOUTTTY ‘3D30A1q] JO sisApeUY 


Proceedings 


“maviq oy} uO ul0z; 4uv Zuyus900N02 suoPeUY{dxe 10 syrvUIeL 403 posn oq Avul quYiq OG JO AVG CG] 


(qume yno Ouynfs woesed fo 10449m) 40 Own) 





“OFUO ‘snquINjoH ‘xeuuy esNoH ©7%7g ‘UO}MN OF77ON SBIR 07 2u98 9Q Pinoys syusTe 
923910 m00 11D ‘yzuow Yous fo 384 oY; UO “paja1dwoo 9Q PINOYs yuUDIG 8443 ‘fo posods}p 
981149430 8} BEDS 9Y3 40 ‘pazuDsd 8} 1D}43 Mou 40{ 40;J0M 40 ‘posdzUa 8) guamOpR{ HOY AM 





eSUj1IVU JO oyuq “TI 

VABpusjep JO GyIq JO O30 “HM 

BHuyed Jo qq Jo o7eq ‘¢ 
PeyVUp M19} Ss} M10T}O 

es¥o 10 pelejue yuemspNe ‘J 

peraep [813 Meu 10} WOON “H 

peyUEls [Vj13 Mou 10J TOPROW “DH 

Pepnypouos [VL “A 

ungeq [SLL “ 

peg Zuypveld yee] “Gd 

seen meee So" poly JUBpUsjEp Jo JOMSUB SI, “OD 


‘yp 4q pery ‘19Mmsu¥ 07 10;1d 
‘UO}JOUL JO JO1INUICP 3I}T “A 


pee DH ‘V 


(0809 syys wp Arddo yoryas moz0q 





40} poyrv0 [Mmuo } sosop 10 osm) 
sowed *08-09 





yUNOMY o74}8 “UBpUEJep 


PeMOTTS #003 £00110378 JI * 


ae a $ O10q ONTeA ORVUMTXOIdd¥ O42 pus 


“="""910q Wooyo “uepusjep 07 
4£j10d01d [eer 10 yeuossed jy ° 





Zuo] Moy 103 pus 





¢ quow Jed 10 





$ yo0m 10d unoUs 07838 “UEpUEjep 
93 Jueumsed Aouom oypoyied jy * 





$ (peyues8 Jf ‘9003 4010738 epnyo 
-Xe) JUNOUIY 88013 J} “UBpUEJep 
103 JUSMISpNf Aouou! [e}0], * 





$ e104 eNnTea oFeUIFXOIdde o4;2 pus 
e104 Yoyo ‘Wy Upeld 07 

A£y10d01d [eer 10 Teuosi0ed J] ° 
Zuo] MOY 103 pus 


$ quow 10d 10 


$ qe0M J0d yunoUe o7R}s ‘gyUTeId 
oO} yuemAEd Aouou oypoyied JI ‘> 
$ (pezues2 J ‘8003 Aou103}8 opnyo 
-xe) JuUnOUIY ss013 Jy ‘ByIUTeld 
403 qUeUZpnf Aououl [80,1 “gs 
vUaVpuejed “Z 
Bnaesa “it --~ 











20% Hae [a f | A 
mod ‘8° Ha 'T-~-|: 
120N0°9"~ “a's |" 
mg '8--~ “sat 
12M10'F~- “a's |" 
Cl: See 2) Oe Se 
20% “Was 
wo “8--~ “F31gT-~|: 
120N0'9"-- Was 
m0g 8--~ “31d ‘TI: 
120M0'F"-- “a's 
W0g ‘g--~ “Flat 
20% ‘Was 
mg "~~ “Fld ‘Tt 
12M10'F"-- “a's 
LC Se 3) 6 Sa 
200° “a's 
Wom '8"-~ "Fd Tt — 
12410'9"-- “a's 
CCC: Oe 1 6 
12m0'-- “Was 
CL: Se 2) Oe 


- 


qyuese1g 





=== 14 
queL 





===|-=gaq, 
Fo WIUIN 





Bs ask as Isat Os It 


===|-==gay 


i [ri od -- ri cd [riod 


LeLelict 





“Sik 
q)UeAeg 





MxIs 





=--san 
REPLR 





===|-=giq 
qun0d 





=|--"san 
PAGL 





“oo ma» 9.4 
~ pu0seg 





Oy 


— een 10UIN 
4 1epIt 
pazunsp Apojsnp aby PUD 














(PINYo soups Jo Apoyeno 
pue eos qooys pee \esveh] 


UsIPIFGO sous JO LpoysnH yuouvuLeg ‘6g-TS 
JO posodstp 10 posojo esfms0eq}O “L-~~ 

yeep 4q pozeqy ‘9"~~ 

qAnoo 4q pezyuBIs ee100q *G™~~ 


qmo0o 4q posnjer 10 3]Ns-UON *7~~~ 
Wo}; pspAne 
JO yUwA J0z ZuN0D Aq possyms;q “~~~ 
wos; Noeso1d 
JO yum 10; yuN0D 4q pessyus}q “Z~~~ 
eorpnfesd 
yoga Ay1ed 4q pessyursiq *T~~~ 
Puy sums wy hive fy. uw wou’ “uuumtoo 
4039 OuO [@ 43408] YooYo pucy 3/01 up Ajuo wo: 
09390-82040 fo 9ev0 uz) ouo [@ YI%) gqo0"¥0 
wop,oU-s8079 Sopey [edpupg 


O8¥Q 10 Uo }s0ds}q JO OpoM “TS ‘Os 


"St 4°910 “*sI7~~ 








—— 
—,, ayqeseajoooy *L-~~ 
“““"g jupuezep ‘sa UopoUNfuy “97 
"gq £aejjs0dgp “sa wopjounfay “97 
“—“"} 911 91uapuod YduIe;U0D “4 
2 


Pouvay Jorlou Iwpedg “az ‘tz 
WAOTAUN 10 10930 “9° 
Wopsods}p Aj1edo1g *g™~~ 
UWerpitqo jo ApoysnD *»"~~ 
UWerp[yqo Jo yzoddng “g~-~ 
suomyly ‘2°-~ 
WWom[puus 10 e01041q seca 
3107780 yoofqns yooqo 
‘e8ejs Aue 38 4803009 Aue J] ‘Ss 
ON ‘2°~~ sex ‘Tt ~ 
*499UD 434n0d Aq Zu} 
“180 [VUPY 3B 380}009 O10} SUM “SS 
on ‘3"~ sex ‘T~ 
:y004N9 jz0emsuB 
eB yUBpusjop jedjouyid pid “IS 
(havesooow ep 
cous Auow ev (@ 4340) 40D) 
2Pe780700D BoPoY Tedpouyzg sum 


(™msu0o 0} ZujT}eur) 
£1yun0D Uzje10y Uy — . 

wo; 

“eoyiqnd jo Zuyjjem Zuypnyo 
-x®) ‘§ ‘QO Uy [yeut 4q eojareg “g--~ 

uAMOTXUN eouep 
-$801—aOFwoFIGnd 4q eo}A10g “4~~~ 

uUAOUy eouep 
-J8O1--aO}IwoOTIGNd Aq” eojareg *g--~ 
©3438 JO 3NO ‘eojAs0s [eUOSIeg ‘g~~~ 


@3038 GI “eof4s08 reuceszed “I~ ~~ 





100 American Statistical Association 


to the effectiveness of the system—rather they are assembled on 
request by an outside agency. 

I turn now to a description of a piece of work which looks toward the 
formulation and installation of a more comprehensive system of judi- 
cial divorce statistics—a piece of work now in process under the aus- 
pices of the Judicial Councils of Ohio (a ‘‘code” state) and Maryland 
(a “‘common law” state). 

Logically, this work may be said to break into three parts: (1) find- 
ing out what should be included in such a system of statistics; (2) think- 
ing through, for the underlying records and documents, an organization 
which would make the required data available; and (3) securing the 
installation of the system. It is hardly feasible to present the descrip- 
tion of the work under these three headings, however, since a given 
activity frequently cuts across two or even all three parts. It seems 
best, therefore, to describe one activity after another, even at the risk 
of making a tiresome catalogue. 

1. The first activity is that of exploring what actually happens in the 
courts; what data are available under the existing schemes of docu- 
ments and records; at what points the system seems to lag in effective- 
ness; in what respects a divorce action tends to reach out into other 
actions. This exploration is being conducted (a) by means of the 
attached data sheet used for inquiry into the court documents and 
records; (b) by a corresponding inquiry into the procedure of attorneys, 
judges and examiners; and (c) by an “observer” study of courts in 
operation. (Since these last two inquiries are in only their formative 
stages, the mere mention must suffice.) 

2. Another activity is that of studying these classifications in terms 
of the practices in all the other states—that of working up uniform 
(or better, comparable) systems of classification, so that any classifica- 
tions eventually utilized in Ohio and Maryland will be as comparable 
as possible with classifications which may be utilized in other states. 

3. Still another activity is concerned with exploring the individual 
and social backgrounds of the divorce problem as a means of ascertain- 
ing what data of these types, if any, should be secured and can be 
secured in the course of the usual court procedures. It goes without 
saying that this exploration may yield results not connected with the 
formulation of institutional statistics, but only the results connected 
with this formulation are here under consideration. 

On the basis of advice and suggestions from students of the divorce 
problem, the accompanying data sheets for answers by plaintiffs and 
defendants have been put in use in Ohio. Their use has been volun- 
tary in the sense that no legal compulsion exists for securing the an- 





ii. 


| | 





yr 


885 
the 
Pre 


1. Never a 
..4. Attende 


not 
..8. Comple' 


..4. Attende 

did not 
...5. Complet 
...6. Attende 


not fini: 
..7. Complet 








Proceedings 


101 


Use carbon to make duplicate. Fill im pages 1 and 4; tarn carbon; then fill in pages 3 and 8. 


Keep one copy for court files. Send duplicates monthly to Miss Nettie Nulton, State House Annex, Columbus, 0. 


PLAINTIFF IS NOT TO FILL 


















































e,@2 
Petition for IN DATA IN THIS BOX 
° bd Docket No. 
Divorce, Alimony, or Annulment = [tress e 
BNGD De Bcccceccccencccuscsecnssess —_ 
J oe 
County —_ Statement by Plaintiff |... ..... 
The Plaintiff should exercise great care im filling out this blank. At the Oe a ee ok 
hearing, the replies will be attested under oath. The plaintiff will supply 4 ts 
information concerning defendant to the best of plaintiff's knowledge Date this blank filled out_...........-... 
and belief. 
Btrest ateck here. 
Present 
pe paiati® First Name Middle Name Last Name Address City State 
Street Gneek bore__. 
Full Name Present 
of Defendant First Name Middle Name Last Name Address City State 
Name of 
Pitt's Atty. Address 
previo , al A 
ee a a an the a OP A, GREE CUI iccccicccccerens WRB ienerccnsenesabeesasne poe 
. . 
A. Personal Data Concerning Parties 
1, Age, Nationality, Citizenship and Residence 
BOW LONG SSS(DENT (LAST TIME) In Citisen 
Date of Birth wLace OF S1ETS United States | Onto te come of U. 8? 
| Mos | Yrs. | Mos | Yre | Mos Check 
fa Mo. Yr. State or Country Yrs. | rs. — 
panes If on nearby farm, check here. ---2. No 
Wite | ~~ 1. Yes 
If on nearby farm, check here... } ...2. No 



































2. Educational Level Attained 
(Check [with in hand rh 
sa an Kander “ep Oae goes | Sous (ou 
be eufflotent) " - 
Gusband Wite | Husband 
~.l. Never attended school | 
4. Attended grade school but 2. agtich 
did not finish at a 
..3. Completed grade school aoe Englis 
4. Attended high school but 4... | ---3- 
did not finish 
...5. Completed high school a English 
a Attended college but did 6. ---4. 
not fini 
2 : English 
mpleted college T—- | 5. able to 
..8. Facts concerning schooling 8... 
unknown 





3. Literacy 


in left hand column for husband 
column for wi ee as many 


fe. 


speak English 


...6. Unable to speak English 


a * oe to read and write in 
ae & read and write in 


Able to read and write in 
some language other than 


Unable to read and write in 
some language other than 


4. Religious Affiliation 
(Cheok (with @) onee in each column) 
Husband Wife 








Wife 1. 
1... | 2. 
ee 

4 
Dens 

6. 
ae 

7 
Gis 52 
ie, ee 











Protestant coe | ams 
Jewish con Ea 
. Roman Catholic ome. ens 
. Greek Catholic — Fe * 
Other — To 
. None cule | eonhls 
. Unknown — ee 
(Check as to attendance) 
. Attendance frequent a 2 
. Attendance infrequent Se * 











5. Occupation 


marriage of parties 


-2 Present occupation 


(State in detail) 


ee eee ee .3 Name of present 
employer 
ldnaidiiainsthidabidabiitdniniee meinen, Ge 
employer 
eReaieneausmrmenaniaebinamaniRR -5 Special occupational 
training 


(Write in for both husband and wife. Name the trade, profession or particuler kind o, 
Husband 


elinaeenstebensensindeeetneneeibeideadeinen -1 Occupation at time of 1. 


3. 
4. 
6. 


(Bpectal Ocoupetions! Pretning refers to troining in etenogrephy, nurving, out: 


f work, as epinner, saleaeman, riveter, teacher, etc.) 











American Statistical Association 





7. Financial Status 





. Estimated value of prop- 
erties, at present time 





. Debts, present time 





. Average weekly earn- 
ings, present time 





. Total annual income, 
present time 





- Children; if so, 
how many... 








m 
_.-3. Other, including 3... 
Mexican 
_.4. Unknown 4.200 


. If joint ae 4 holdings, state 





value of assets above liabilities $ 


. Relatives; if so, ~~ 


how many___ 








B. Data Concerning Present Marriage Relationship 


9. The Marriage and Separation 


1. Date of marriage: 2. 





Moath Year 


3. pane < final separation (or, if not earlier sepa- 
of decision to file petition for divorce) acai 
jon 


Year 





5. How many separations prior to this final one. 
7. Age of husband at time of marriage. 





Years Months 





. Age of wife at time of marriage. 
Years Months 


11. Underscore appropriate word indicating whether the marriage 
ceremony was CIVIL, RELIGIOUS, COMMON LAW 


. Period parties were acquainted before marriage___ 


10. 


Place of marriage. 





City State or Country 


. Which party left home at this final separation__ 
. Number of children of this present marriage 


(include both living and dead) Living. Dead____. 


-+-----——— 


in Me 


Period parties were engaged before marriage... 
ir Me 





10. LIVING Children This Present Marriage 
(Notice that thie section deals only with Living Children) 





State or Count 
Names (start with eldest) | te Which Born 


fi 


Pe pepo 















































Foster Children, if any, of this marriage 





























11. Residence: Urban or Rural 
? 


(Bater number of years this family has lived in each type of community 
SINCE MARRIAGE to time of final seperation) 


- Years resident in large cities (over 25,000 population) 
. Years resident in small cities (2,500 to 25,000 population) ________ 


. Years resident in town or village under 2,500 population 





. Years living on a farm 


12. Marital Status of Parties Immediately Prior to 
Present Marriage 


ae a a 
column) 
Husbend 

Single 

Married 

Divorced 

Widowed 

















Proceedings 103 


C. Data Concerning Previous Marriages (if any) of Parties 


18. History of Previous Marriages 14. Children of Previous Marriages 
Husband Wife Husband Wife 


common law 1. Number of children of these 
both and dead) 


Number of times widowed 
2. Number of these children of previcus 
0 


Number of times divorced now 


15. Three Immediately Prior Marriages, if had, of Husband 
Name Swe Rs Place of Prior Marriage yng 
Spouse Marriage 
a State or Country When Wee, 
~~"2. Spouse 
me Spouse 


16. Three Immediately Prior Marriages, if had, of Wife 
Date of Place of Prior Marriage If Divorce. to 
Marriage = Whom Wi 
Name of Spouse Year State or Country t Decree Awarded? 


aie 
eee Spouse 
wes 


17. LIVING Children of Husband by Earlier Marriage or Marriages 
(Notice that thie epplice only to Living Children) 


Names (start with eldest) Age — oS Saw in Present Address 


18. LIVING Children of Wife by Earlier Marriage or Marriages 
(Notice that thie eppites only to Living Children) 


Nawes (start with eldest) age yt Country on Present Address 


D. Data Concerning Immediate Relatives of Parties 


19. Concerning Parents 
Place of Birth 


Year of Trade, profession, or particular kind of work, 
Birth State or Country as spinner, salesman, riveter, , ete. 





American Statistical Association 


19. Concerning Parents—.(Continued ) 


Were husband's parents separated or divorced? 
(Check) ......1. Yes enna: Ti 
Until what age did husband live with his parents?__ 


If husband’s family was broken by death of a parent, or by 
separation or divorce of parents, how old was husband at 


that time? 


How many brothers and sisters did husband have (counting 
both living and dead)? __ 

Counting both living and dead, and counting the eldest — 
as first, what was the order of birth of husband in hb 


family? 
How many of the husband's brothers and sisters have ever 


ted, deserted, or divorced? 


2. Were wife’s parents separated or divorced? 
(Oheck)  ......1. Yes csscesle Wb 
4. Until what age did wife live with her parents?_____ = 


6. If wife's family was broken by death of a parent, or» 
separation or divorce of parents, how old was wit: ¢ 


2. How many brothers and sisters did wife have (county 
both living and dead)? 

4 Counting both living and dead, and counting the eldest ay 
as first, what was the order of birth of wife in 


en 
6. How many of the wife’s brothers and sisters have oe 


been separated, deserted, or divorced? .......... 


been separa 
How many of the husband's brothers and sisters have ever 8. How many of the wife’s brothers and sisters have ee 


married? married? — 


E. Home Surroundings of This Family when Last Living Together 


23. Size of Dwelling: Occupants 
(Anewer for dwelling of this family) 

. Number of rooms (not counting halls, 

attic, basement and bath rooms) onnenidll 
. Number of families living in 

these rooms 
. Number of persons living in 

these rooms 
. Number of husband's relatives 

living in, 





22. Type of Dwelling 
(Check [with @}) Use only one check) 


21. Tenure of Home 
(Check (with @} Use only one check) 


---.1. Rented 
--..2. Owned free 
--..3. Owned mortgaged 


--..4. Owned, mortgage state unknown 


(Bater emounte asked) 
1. If rented, what 
monthly ren 
2. If owned, value 
of home 


.-.-1. House, attached or detached 
---.3. Apartment 


.-..3. Light housekeeping rooms 
thése rooms 
. Number of wife’s relatives 
--.-4. Rooms with or without board living in these rooms 
6. Number of lodgers (excluding 
relatives) living in these rooms 








....6. Basement rooms only 





24. Household Equipment and Household Help 


1. Underscore labor-saving devices which were used in the home: 2. Underscore which of following were owned: 
oe — VACUUM CLEANER, WASHING MACHINE, AUTOMOBILE, PIANO, RADIO, VICTROLA, TELEPHON 
MACHINE 


3. Was home lighted by gas or electricity? 
6. Was the home furnace-heated? 
7. Was all or most of the laundry sent out? 


25. Latest Residences at Which Parties Lived Together 


Number and Street Cit: 
(If farm, merely check) ¥ 


4. Was there running water in the dwelling? ...... 
6. Was household help employed as much as one day in tw 











State or Country 


26. Economic Adequacy and Health 
1. When last living together did the family have substantial difficulty living within its means? _.... = 
3. At that time did husband have serious chronic illness or disability? _........... If so, what? oe 
3. At that time did wife have serious chronic illness or disability? -- If so, what? o_o 
Ne ee Ce eT ay NT OP CLUNENNT os 
so, what? 




















th 
tf 


Le 











Proceedings 105 


swers sought, although some of the judges have been quite earnest in 
expressing their desire that the data be turnedin. Returns are coming 
in from most of the counties of the state and it seems probable that 
3,000 complete schedules will be turned in. 

Of course, no one will suppose that these plaintiff-defendant data 
sheets are thought of as component elements of a permanent system of 
institutional statistics in this field. It may be that relatively little of 
their material can be or should be salvaged for this purpose. They are 
exploratory devices used to feel out various situations. They are a 
part of the preliminary work involved in formulating a system of judi- 
cial divorce statistics. Another phase of this exploration into the indi- 
vidual and social backgrounds of divorce is that of making several 
hundred case studies (preferably as many as two thousand) into the 
social and psychiatric backgrounds of family disorganization. Such 
case studies would be a contribution in their own right and they should 
be expected to throw much light on the desirable content of institu- 
tional statistics in the divorce field. Whether this phase can be 
opened up in a satisfactory way depends upon securing the necessary 
financial resources—which would of course be considerable. 

4. Still another activity is that of bringing together the different 
threads of inquiry and then formulating a system of underlying records 
and documents which will cause the desired data to be available and 
readily worked into a statistical system. 

This activity, naturally enough, has been carried only a short dis- 
tance at the present time. It is self-evident that many decisions cov- 
ering what data should be made available must be made before attacking 
the much simpler task of how to make the data available. Tenta- 
tively, it now appears that this latter task will not be very complex, 
and that a standard face sheet to accompany the original petition for 
divorce as it passes through the stages of court procedure plus a stand- 
ard appearance docket would, for most jurisdictions, constitute the 
main needs in the way of new underlying documents and records. It 
is even possible that a detachable stub might be made to serve the 
needs of state registration of divorce with an insignificant addition of 
time and labor. 

If this tentative judgment of the situation proves correct, a nation- 
wide system of judicial divorce statistics (including the necessary 
underlying documents and records), useful alike to the courts and to 
the public, could readily be formulated and at practically no added 
cost over that of the present hopelessly inadequate system, provided 
only that we can become clear what data should be secured, and can 
be secured in dependable form. 





106 American Statistical Association 


5. Assume that a competent system of judicial divorce statistics can 
be formulated. There remains the problem of securing the actual in- 
stallation of any such system. 


DISCUSSION 
I 


By R. Ciypz Waits 


Dr. Marshall has presented a descriptive account of a research proj- 
ect concerned with divorce statistics in Maryland and Ohio, which is 
being carried on by the Institute of Law of The Johns Hopkins Univer- 
sity. The study will be interesting to students in the social sciences, 
whether or not the results should be accepted as a plan of recording and 
reporting divorce statistics, and not the least of the points of interest 
will be the method of checking the accuracy of the data which is finally 
worked out, because the value of the study will turn upon the dependa- 
bility of the data. 

It is no easy task to discuss a descriptive paper, as Dr. Marshall 
indicated in his letter to Mr. Mead and myself. The data of the study 
are not available, and, obviously, no interpretations could be given. 
The paper is concerned with what is probably significant in divorce 
statistics and with the method of procedure. All of the common cate- 
gories of social data are represented, although Dr. Marshall states that 
there is no intention of suggesting the plaintiff-defendant forms as parts 
of a future record system. ‘They are exploratory devices used to feel 
out various situations in the preliminary work involved in formulating 
a system of judicial divorce statistics,” he says. 

It is on this point that I wish to raise one question concerning the 
method of determining what are significant data in divorce statistics. 
The plaintiff-defendant forms are frankly used as a means to that end. 
Why should not the case studies be conducted concurrently and with 
some of the same litigants for this purpose? Dr. Marshall suggests 
that, in addition to their inherent interest, the case studies may have 

an exploratory value, but it appears that, while the statistical forms are 
being used at present, the case studies have not begun. They have not 
been started, it must be admitted, for an excellent reason, namely, that 
there are no funds. But methodologically they should be carried con- 
currently and should include identical cases. 

The case study may be an important aid to statistical work. A few 
cases studied in great detail tend to bring out the significant facts 





eel 





> of DM mt ee OHO Ot 


jo | 


for 








Ae nae me = 








Proceedings 107 


which can be expressed statistically. The qualitative factors which 
are not amenable to statistical treatment are associated with objective 
factors which do lend themselves to statistical recording and analysis. 
But the qualitative factors alter the functioning of statistical factors. 
Apparently it is the aim of the present study to include the more elusive 
social and psychiatric influences. If such information is obtained from 
the same litigants who furnish the statistical data sought in the printed 
forms, it should increase the accuracy of judgment as to what are sig- 
nificant divorce statistics. 

In my judgment this point of view is strengthened by the results of 
some preliminary statistical skirmishing which we have done in In- 
diana. We found a correlation of +.811+.047 between divorces per 
100,000 population and persons per 100,000 population in hospitals for 
the insane in Indiana from 1900 to 1926. This is a high positive 
correlation, though it may indicate mere coincidence rather than inter- 
dependence, but it is sufficiently high to warrant psychiatric and psy- 
chometric examinations of litigants in divorce courts who are the sub- 
jects of research projects. This opinion is reinforced by the fact that 
on December 22, 1930, of 847 persons in the Indiana Central Hospital 
for the Insane who had been married, 12.0 per cent had been separated 
or divorced from their marriage partners at the time the patients were 
admitted to the hospital. This is a much higher rate of divorce 
among the insane than appears in the general population. While 
serious mental disorders appear to lead to divorce in many cases, one 
might suggest that the milder psychoneurotic tendencies probably 
land individuals in the divorce court instead of the mental hospital. 
One may not assume that this is an important factor in all divorces, but 
for purposes of investigation it might be assumed to be important in a 
considerable number of cases. 

Are persons with marked psychoneurotic tendencies or low intelli- 
gence likely to be repeaters in the divorce court? If they are, this has 
a very practical application: such persons are expensive to the com- 
munity, and they consume a large part of the time of the court. This 
is a matter of concern to the taxpayer, and, because it is, it might be an 
element in good “social reporting.”” These psychoneurotic tendencies 
might be associated with certain objective factors, in which case indi- 
rect statistical treatment of such data might be possible, if they are 
found to be significant. 








American Statistical Association 


II 


By BENNET MEAD 


First of all, I should like to ask Mr. Marshall, what procedure is 
being used for checking the accuracy of returns in the Ohio and 
Maryland divorce studies? 

In the second place, I should like to comment on one feature of the 
Ohio divorce schedule which seems to be incomplete. There is a 
question regarding occupation, but there is no inquiry as to the indus- 
try, trade or business in which the person is engaged. The Bureau of 
the Census has found that it is very important to obtain this informa- 
tion as well as the occupation or specific type of work, partly because 
of the fact that many occupational terms are not used in a uniform 
manner. For example, “‘bakers”’ are to be found in a number of metal 
working industries as well as in bakeries. Furthermore, the 1930 
classification of occupations has been worked out primarily on a basis 
of the industry group and secondarily in terms of specific occupations. 
Any data as to occupations which are obtained in sociological inquiries 
ought to be obtained in such form that they can be properly compared 
with the Census statistics of persons gainfully occupied. I fear that 
such comparison will prove difficult in the case of the Ohio divorce data. 

Mr. Marshall has set forth admirably the criteria or standards for 
judging the particular items which are being obtained in the Institute 
studies as to their value and feasibility for inclusion in a permanent 
system of court statistics. As Mr. Marshall has pointed out, such 
statistics should be of service for social reporting, that is, for throwing 
light on social conditions and trends; this being the case, it is necessary 
to recognize an inherent limitation, for social reporting purposes, of any 
statistics which are confined to the procedure in the ordinary divorce 
court. This is, that the sole legal function of the court is to reach a 
technically correct decision as to whether the couple in question may, 
under the given circumstances, be divorced. This means, in practice, 
that only the legalistic aspects of the situation are subject to statistical 
study from the existing types of court records, and that it is therefore 
difficult, if not impossible, to arrive at a real understanding of the 
family maladjustment which is the actual social problem. We must 
deal statistically with divorce largely in the same way as the old- 
fashioned doctor, who confined himself to treating symptoms, and 
made little or no effort to understand the underlying pathological 
conditions. 





Proceedings 109 


It is worth while to call attention to the fact that the feasibility of 
obtaining any given data from the court officials is not a fixed quantity, 
but varies according to the effectiveness of the incentives which are 
applied. The Institute of Law is giving a very effective demonstration 
of the fact that fairly detailed data may be obtained from ordinary 
court officials, if the proper incentives are utilized. 





110 American Statistical Association 


SOME SUGGESTIONS FOR IMPROVING OUR INFORMATION 
ON WHOLESALE COMMODITY PRICES 


By Morris A. CoPpELAND 


It is generally recognized that our information on retail prices is far 
from adequate. The problem of organizing such information is 
extremely complex. It is not so generally realized that our informa- 
tion on wholesale prices is a baffling array of unorganized and inade- 
quately evaluated data. But the prospect for improvement of our 
wholesale price data is more promising than that for retail data. 
Consequently, the Committee on Price Statistics has thought it wise 
to concentrate attention for the present on wholesale prices. 

In emphasizing the inadequacies of our information on wholesale 
prices I do not wish to be understood as underrating the great progress 
which has been made in this field in recent years. Our information 
is distinctly better today than that of any other important nation 
with the possible exception of Germany. But American economists 
are still far from possessing a good basic technique for the scientific 
study of prices, a good technique of price measurement. 

In discussing the possibilities of improving our wholesale price 
information I propose to confine my attention chiefly to the problems 
which arise in connection with the measurement of individual com- 
modity price movements and the construction of commodity composite 
prices or individual commodity price index numbers. There are 
important fields of wholesale prices for which little or no reliable 
information is now available, notably for highly elaborated manu- 
factured articles and for branded goods. In many other fields of 
wholesale prices we have a considerable amount of data, but we are 
confronted with a difficult problem of sampling. Thus for some 
commodities we have a number of “pure price series,’ each a set of 
quotations on a standard grade for standard terms and a definite 
market, but we do not have any organized information to tell us the 
relative importance of the various series. For a few commodities we 
have, in addition to the pure price series, commodity index numbers 
or composite prices which purport to be representative of the market 
movements as a whole, but the representativeness of these index 
numbers is in many cases unfortunately open to question; thus, for 
bituminous coal we may compare two index numbers, that of the 
Bureau of Labor Statistics and that of the Coal Age. The B.L.S. 
index for this commodity on the 1926 base was 38 per cent in 1913. 














Proceedings lll 


The corresponding figure for 1913 according to the Coal Age was 56 
per cent. For 1929 the B.L.S. figure was 91 per cent and the Coal 
Age figure was 81 per cent. For annual averages these are striking 
discrepancies. The B.L.S. index refers to central markets, the Coal 
Age to F.O.B. mine prices, but the discrepancies are too large to be 
explained on this basis alone. For coal and for a number of other 
commodities which I cannot stop to illustrate we badly need a critical 
evaluation of existing indexes and probably some alterations in the 
methods of their construction. 

The thesis which I propose for your consideration is that our whole- 
sale price information would be greatly improved if we had for each 
of the forty or fifty most important commodities the following informa- 
tion: (a) A fair sample set of pure price series with full and precise 
specifications for each. (b) The organization of the sample series into 
one or more commodity composite prices for each commodity. Mills 
defines a pure price series as ‘‘one which relates to a homogeneous 
commodity, which is drawn from a single market ... , and which 
is derived throughout from the same type of transaction.’’' In a 
pure price series reworking and interpretation of the original quotations 
should be at a minimum. Comparability of items throughout the 
series should be fully maintained by a standardized set of specifications 
as to grade, time and method of securing quotations, market and terms 
of sale. A change of specifications marks the end of one pure price 
series and the beginning of another. Such changes should be fully 
noted in publication. A commodity composite price or individual 
commodity index number is an average of several pure price series 
representing different grades or markets but all presumably at the 
same stage of production or distribution. It purports to be repre- 
sentative of a given set of grades and markets. It also enables us to 
bridge the gaps involved in changing specifications, so as to give a 
continuous price history. Whether such composites are needed at 
daily, weekly or monthly intervals depends chiefly upon variability 
or stability of the price in question. The distinction between a pure 
price series and a commodity composite price is of fundamental 
importance to accurate work in the measurement of price changes. 
In order to interpret our pure price series and commodity composite 
prices for leading commodities, in order to be able to judge the fairness 
of a sample set of pure price series or to be able to revise it, in order to 
keep the commodity composite prices representative, we need period- 
ically for each important commodity or industry a study of data related 
to prices which will tell us the chief sources of supply and methods of 


1F. C, Mills, Behavior of Prices, 1927, p. 35. 








112 American Statistical Association 


production, the chief types of demand, the relative importance of the 
various grades and markets, the forms of market organization, and the 
methods and terms of sale. 

As a first step toward improving our wholesale price information, the 
Committee on Price Statistics proposes a general survey and critical 
analysis of the work of the chief price compiling agencies. Such a 
survey would be directed so as to determine, among other things, the 
chief gaps in our information on individual commodities, the most 
important pure price series needed, the principal price composites 
which are lacking. Such a survey would also presumably point out the 
chief existing pure price series and commodity price composites which 
need specific study and evaluation. The second proposal of the 
Price Committee is a series of specific commodity and industry studies. 
These would be concerned in part with the appraisal of existing pure 
price series and commodity composites, and with making suggestions 
for their improvement. 

I should like to point out some of the advantages of having a set 
of commodity composite prices for some forty or fifty leading com- 
modities. There are a number of advantages in connection with the 
construction of general and group price index numbers. It is a 
statistical commonplace that we need different index numbers for 
different purposes. This is true, of course, of individual commodity 
composite prices as well as of broad general indexes, but not in the 
same degree. A set of composite prices, one for each commodity, can 
usefully be combined in a great variety of ways for a great variety of 
purposes. The B.L.S. wholesale price index groups commodities 
partly on the basis of source of material, partly on the basis of use, and 
partly on the basis of stages of production. Because it aims to be a 
general-purpose grouping it is necessarily ill-adapted to any specific 
purpose. And it is difficult to use the B.L.S. data to get regroupings 
for special purposes, because for many commodities, such as wheat, 
cotton, cattle, potatoes, eggs, butter, milk, wool, petroleum, gasoline 
and semi-finished steel, no commodity composites are computed. 
Incidentally, the lack of such composites makes it difficult to analyze 
the causes of changes in the index, and to check the satisfactoriness of 
the B.L.S. sampling against other price data. Individual composite 
prices would greatly increase the general usefulness of the B.L.S. 
indexes, and would probably help to increase the accuracy of the 
indexes. 

We have at present no official general weekly price index. The 
current compilation of general weekly indexes would be greatly facil- 
itated if we had an accurate weekly composite price for each of the 

















Proceedings 113 


principal commodities which show wide weekly fluctuations. At the 
present time we lack such composites for such important commodities 
as bituminous coal and lumber. 

Individual commodity price composites are useful, if not essential, 
to the study of the behavior of individual prices in relation to produc- 
tion, inventories, capacity and consumption data. In the first place, 
our present information is defective because the pure price series and 
physical and dollar volume data now available are often not correlative. 
The price series refer to sample grades and markets only. For proper 
comparisons we need either a detailed breakdown of the physical and 
dollar volume data, or a composite price representative of all the 
grades and markets. In the second place, pure price series by their 
very nature cannot take account of changes in quality and design, in 
terms of sale, and in the importance of various markets and brands. 
The problem of taking account of these changes in our composite prices 
is a vexing one, and one badly handled, for example, in the present 
B.L.S. index of automobile prices. But a more satisfactory handling 
of the problem is necessary in almost any study of the relationships of 
price to physical and dollar volume data, or in any attempt to in- 
vestigate the changing margins between different stages of production 
and distribution. In other words, until we get good composite prices, 
we lack the basic measurements for a scientific study of the causes of 
price changes, a set of price data which isolate price changes from 
changes in the character of the thing priced, and which are properly 
comparable with physical and dollar volume data. 

The problem of improving the quality of existing series of quotations 
is a far more important one. So great is the variety of practices 
followed by the various agencies collecting and publishing quotations, 
that we can hardly speak of standards of practice today. Specifica- 
tions are often incomplete and inaccurate, and changes in specifications 
are often inadequately noted. In many cases, if not in all, some 
reworking of the quotations as collected is necessary before publication. 
The wide diversity in present methods and extent of reworking basic 
data before publication makes it difficult to tell what can properly 
be regarded as a pure price series and what can not. Trade publica- 
tions are notoriously careless in their handling of quotations. They 
have, however, an important advantage over a central price organiza- 
tion like the B.L.S.: they know the trade facts and trade gossip. 
Accurate compilation of pure price series without such information is 
probably impossible, because one has no way of judging the reasonable- 
ness of a quotation. The problem of identifying purely nominal 
quotations in a period of price concessions and of what allowance, if 





114 American Statistical Association 


any, to make in such a case, is an especially puzzling one. In short we 
may fairly characterize the conditions in the gathering of basic price 
data and in preparing them for publication as chaotic. We badly 
need a set of standards of statistical practice in this field, and the 
drawing up of such standards is an important part of the project of the 
Committee on Price Statistics. Indeed, if the Committee can succeed 
in drawing up a good set of standards of practice for gathering and 
compiling pure price series, this accomplishment alone will in my 
opinion more than justify the Committee’s existence. The best cur- 
rent practices in the collection of price information are undoubtedly 
represented by the work of the Bureau of Agricultural Economics. 
I venture to suggest that in the effort to raise the standards of practice 
in the handling of price quotations it may be possible to follow the 
example of the Bureau of the Census in the field of vital statistics and 
establish a ‘“‘registration area” or white list of price series which are 
compiled according to standards of an approved character. 

The second aspect of the problem of improving our basic wholesale 
price information has to do with the organization of the pure price 
series for each commodity into one or more composite prices. Both 
the development of price composites which are now lacking and the 
critical evaluation of existing series are of paramount importance here. 
I shall mention only three of the more troublesome phases of these 
tasks: (1) As an ideal at least, we should like to have published the 
basic pure price series that go into every composite price, and precise 
specifications as to the methods of compilation. In cases where the 
basic data are confidential this is of course impossible. In such cases, 
therefore, we need especially to be assured of the high standards 
employed in the collection and compilation of data. It would be well 
if the certification of these standards could be made by some agency 
other than that compiling the price composite. (2) In many trades 
the situation is complicated by the existence of both spot and contract 
prices. In many of these cases it may be desirable, therefore, to have 
more than one composite. Just how such a situation should be handled 
in any particular case is a problem that calls for careful study. At 
present there is little in the way of standard practice in handling it; and 
in the construction of general price indexes the very distinction be- 
tween spot and contract prices is inadequately recognized. (3) It is 
a nice question as to what constitutes a commodity, and this also calls 
for special study of each case. But I think it is clear that present 
practice on this point is in need of revision. When the B.L.S. calls 
its 18 butter quotations 18 commodities, and its dozen or more cement 
quotations one commodity, the term commodity is clearly being 





Proceedings 115 


stretched. And such stretching tends to obliterate the important 
distinction between a pure price series and a commodity composite 
price. Finally, let me say with regard to the service of maintaining a 
commodity composite price that it is one that calls currently for special 
knowledge of the trade involved. Unless a central bureau is prepared 
to keep in touch with the factors in a trade which bring about price 
changes, it cannot hope to maintain a composite price or a fair sample 
set of quotations that accurately reflect price changes as they occur. 

It is obvious that the task of improving our information on individual 
commodity wholesale prices, so that it will provide the basic measure- 
ments essential to a scientific study of price movements, is a large one, 
both on the side of establishing standard practices in the handling of 
quotations, and in the construction of commodity composites which 
shall measure only price changes and which shall be typical of the mar- 
kets to which they refer. It calls for a carefully planned program of 
research. But it is clear that the task cannot be done once and for all. 
The task of revising and improving our information will be always with 
us. Because of the continuing nature of the problem the work of the 
Price Committee can only be one of making a start in a direction where 
a new impetus is greatly needed. And because of the continuing nature 
of the problem I venture to conclude with a tentative suggestion regard- 
ing the ideal division of labor in the handling of our price information. 
The collection of data and the compilation of price composites call for 
special current knowledge of individual trades and industries. This 
work can best be performed by special agencies working on these 
special trades and industries. But there will always be need for a 
central agency not only to compile general and group price indexes, 
but also to codrdinate and supplement the work of the special agencies 
and to supervise their methods and encourage the improvement of 
their standards of practice. Whether such a division of labor will be 
feasible is perhaps one of the questions into which the Price Committee 
may eventually see fit to inquire. 





116 American Statistical Association 


ON THE USE OF INDEX NUMBERS OF PRICES IN THE 
STUDY OF ECONOMIC CHANGES 


By Freperick C. MILs 


Historically, index numbers of prices have been constructed for 
the primary purpose of measuring changes in the value of money, and 
this remains today a major objective. But this is not the only purpose 
which such measurements now serve. Index numbers of prices are 
widely followed for the light they throw on the processes of production, 
exchange and distribution, and on the dynamic changes, secular or 
cyclical in character, which affect the operations of an eccnomy. 
This purpose is, of course, related to the first, but it does not follow 
that what is an adequate index number for the first purpose will be 
satisfactory for the second. Current index numbers of prices do not, 
in fact, yield the information we should like to have concerning 
economic changes. The tremendously important movements of the 
last eighteen months, for example, are far from clear to us. One 
reason for this is found in the deficiencies of the price record. 

Three main requirements of price index numbers intended for use 
in the study of economic changes may be set down. They should be 
based upon comprehensive compilations of price series, representative 
of transactions in all important commodity markets. They should be 
so constructed that changes occurring in different parts of the price 
system might be followed. They should be comparable with index 
numbers measuring changes in non-price elements of the economy. 
Each of these points requires brief elaboration. 

The character of the individual series which should be used in con- 
structing index numbers of prices has been ably discussed in Professor 
Copeland’s paper. His emphasis is on the intensive aspects of the 
problem before us—on the improvement of the quality of the individual 
bricks to be used in our building. [should endorse most heartily all 
that he has said, urging only that the price-compiling authorities give 
us the “pure” price series in unadulterated form, as well as the derived 
composite prices. For such “pure”’ series, relating, over a period of 
time, to precisely the same commodity, to the same market, and to 
transactions in which buyer and seller are clearly defined, are essential 
raw material for all work in the field of prices. 

But more bricks are needed, as well as better bricks. If an economic 
intelligence service is to function properly we sorely need more informa- 
tion concerning the numerous price transactions through which 





—. ~~ ees ~ — Os mr ~_ a 


Proceedings 117 


economic activities are carried on. It is customary, and proper, in 
compiling data for use in an index number, to emphasize the im- 
portance of securing quotations representative of all important indus- 
tries and of all important commodities. Not so much attention has 
been paid to securing quotations representative of all parts of the 
country and of all manufacturing and merchandising stages through 
which goods pass between producer and ultimate consumer. It is 
true that various markets and various productive and distributive 
stages are represented in current price compilations, but they are 
represented in haphazard fashion. The usual collection of “ whole- 
sale’’ prices contains a heterogeneous assortment of quotations relating 
to transactions carried on by all sorts of buyers and sellers in all sorts 
of markets. The prices of manufactured goods which are available are 
in some cases prices charged by manufacturers; in other cases they are 
prices charged by dealers several stages removed from the manufac- 
turer. Unless we carefully distinguish different types of price trans- 
actions and different distributive stages, our data are of but limited 
utility in the following of economic changes. 

An adequate collection of price statistics, representing all important 
phases of our economic life, would serve many purposes. It would 
throw light on marketing operations, and on changes in marketing 
methods and conditions. It would enable the industrial and regional 
incidence of business cycles to be studied, and changes in profit margins 
to be followed. Regional differences among prices could be accurately 
measured. The nature of competitive prices could be more adequately 
investigated. The information yielded by such quotations would 
bear immediately on projects for stabilization of the price level, and 
would be essential in judging of the validity of such projects. I think 
there is no more promising field than this, for the extension of our 
knowledge of economic processes. 

The second requirement I have suggested is that price index numbers 
should be made to yield far more information than they now do con- 
cerning the changes occurring in different parts of the price system. 
This means that our basic price data should be classified in more 
significant ways, and that more attention should be paid to the con- 
struction and interpretation of group index numbers. No one is as 
yet prepared to say what are the most significant groupings of com- 
modities. In fact, no definitive groupings may be laid down, for the 
classification to be employed in a specific case will depend upon the 
purpose of the investigation. There are, however, a number of 
classifications which are of obvious economic significance. Only a 
few of these are employed in current work on index numbers. 





American Statistical Association 


1. Geographical differences in price movements should be 
represented in a classification of price quotations. The behavior of 
commodity prices varies from section to section of this country, and 
this diversity of behavior should be reflected in our index numbers. 

2. The prices of raw and processed goods should be separated. 
This classification is perhaps not so significant in itself as when 
used in conjunction with other classifications. 

3. A distinction should be made between the products of Ameri- 
can farms and all other products. Under each of these heads the 
prices of raw and of processed goods should be separately studied. 

4. A further division of somewhat the same sort has been made 
in the past, and should be revived. This is a classification of price 
quotations into those relating to forest products, farm crops, 
animal products and mineral products. Here, again, raw and 
fabricated goods should be distinguished under each head. 

5. Accurate information concerning the buyer and seller repre- 
sented by every series of price quotations would permit the con- 
struction of index numbers relating to successive steps in the 
movement of goods from original producer to final consumer. 
A comprehensive series of such index numbers would throw much- 
needed light on the inner workings of the economic system. 

6. A distinction between producers’ goods and consumers’ 
goods is of the greatest aid in following economic movements, but 
the full significance of this classification is lost unless subdivision 
be made under each of these heads. Raw and processed goods 
should be separated. Producers’ goods should be broken down 
inte those which are intended for use in the construction of 
capital equipment and those which are destined for human con- 
sumption. Consumers’ goods, again, should be classified into 
those which are perishable, semi-durable and durable. 

This brief list does not by any means exhaust the classifications 
which might be made in attempting to locate commodity groups which 
are subject to distinctive price-determining forces. Prices of goods 
imported and of goods exported are affected by different influences; 
prices of goods produced primarily by large establishments and of those 
produced by small establishments differ from each other in their be- 
havior; the prices of goods in which the cost of labor is a prime factor 
differ in their behavior from the prices of goods in which labor cost is 
of small importance. These and numerous other classifications are 
being tested in work we have under way at the National Bureau of 
Economic Research. Such detailed study is probably a task for the 
private investigator, and not for a governmental body. The agency 





le ee ee ee ee eee | ee 


Proceedings 119 


which compiles the quotations should, however, present its materials 
in such shape and should give such detailed information that experi- 
mental classifications may be readily made. 

As a final requirement I have suggested that price index numbers 
to be used in the study of economic processes should be directly com- 
parable with similar measurements relating to the behavior of non- 
price elements of the economy. Recent years have seen a mushroom 
growth of index numbers of all sorts. There are index numbers of 
production, of profits, of employment, of earnings and of various other 
types. All these index numbers help to throw light on current eco- 
nomic problems, but they lose much of their significance because they 
are not comparable, one with another. This lack of comparability 
is an inevitable outcome of their development, and of the conditions 
under which they have been constructed. Numerous public and 
private agencies have compiled the statistical records. In certain 
fields the available data have been relatively numerous; in others they 
have been scanty and inadequate. But at present the materials avail- 
able in a number of economic fields are sufficiently comprehensive 
to justify more serious effort to build up a system of index numbers 
which would permit changes in prices, production, distribution, stocks, 
employment, earnings and profits to be studied, not in isolation, but 
each with reference to concurrent changes in other fields. Something 
has already been done to secure this comparability, but only a small 
beginning has been made. When such comparability has been secured ~ 
on a wide scale, when the picture revealed by our statistical records © 
reflects the essential unity of the underlying processes of the economy, 
we shall be in a position to exploit those records to the full. 

In conclusion, I would urge the necessity of a fresh consideration 
of the problem of securing an adequate sample of the price transactions 
which record contemporary economic activity, and a review of the 
purposes to be served by index numbers of prices. Among these 
purposes I would place major emphasis upon the needs of an economic 
intelligence service. It is, I think, a serious reflection upon the state 
of economic science that such a collapse as the present could have come 
upon us virtually without warning, and that our knowledge of the 
existing situation, and of the road we must travel before prosperity 
may be restored, is so meager. One way to improve the science is to 
improve our intelligence service, to discover what is fundamental in 
the following of economic changes, and to compile a more satisfactory 
current record of these changes. This task involves the securing of 
more and better information concerning the movements of prices, in 
relation to other economic changes, than we have heretofore had. 





120 American Statistical Association 


THE GENERAL STRUCTURE OF WHOLESALE PR) ‘ES 


By Rosert W. Buraess 


The purpose of price index numbers is to provide a convenient a, | 
accurate summary of price changes as a means for increasing our un- 
derstanding of differences in price conditions at two or more different 
times or in two or more different markets. Inasmuch as the substance 
of any field of knowledge is more important than the implements used 
to cultivate it, and should determine the character of those impleme its, 
it is desirable at present, in my opinion, to devote more effort to index 
numbers of the prices of completely manufactured commodities of 
changing design, and of services, than to further refinement of index 
numbers of standardized basic materials. This change of emphasis is 
desirable in order to cultivate the whole field rather than merely a part 
of it. It is also desirable to develop and to secure general acceptance 
of a somewhat more varied technique to facilitate the summaries of 
various price conditions. For instance, a comparison of the food 
cost of living now and twenty years ago should include somehow the 
change in death rates due to dietary improvements made possible by 
the improved functioning of the food supply. 

My original intention in connection with this discussion was to 
discuss not the general purpose of index numbers or the general price 
structure, but a particular question, namely, the conditions under 
which price index numbers can be used to convert a series of values— 
sales, exports and imports, plant valuations—into a series of physical 
volumes. As I tried to work out my argument, however, it seemed to 
me that the real reasons for differences of opinion on this technical 
matter were in reality different conceptions of what the general struc- 
ture of wholesale prices really is and what the important problems in 
connection with prices really are. 

Some economists and statisticians seem to regard the wholesale 
prices published in various periodicals as typical of all prices, whereas 
it seems to me that they constitute a highly specialized class subject to 
different laws than many prices which can never be quoted daily, 
weekly or monthly because of their confidential nature, the uniqueness 
or lack of standardization of the item priced, or rapid changes in design 
or specifications, which would make a published quotation misleading 
to non-specialists. 

Some economists, again, seem to think that the overshadowing 
problem of commodity prices is that of the value of gold in terms of 


























Proceedings 121 


comm<¢ ‘s, whereas it seems to me that there are many price prob- 
lems ¢ ’ equal or greater importance, such as: (1) That of the commodity 
statis ‘an—forecasting future prices of individual commodities by 
mean vi interrelationships between commodities or of relations to 
gener..! business or the progress of particular industries; (2) That of 

« purehasing agent—appraising the fairness of prices on the basis 
of costs of raw material, cost of the fabricating process and reasonable 
profit; (3) That of the economic historian—estimating the real degree 
of progress made by our material civilization in a decade or a century; 
(4) That of the investment trust statistieian—determining the effect 
of tices of commodities and services on the profit-earning character- 
isties of the industry selling or buying these commodities and services; 
(5) ‘That of the economist with political leanings—developing, per- 
haps, more satisfactory methods of arriving at prices in certain fields, 
for instance, the oil fields. 

In order to start to solve any of these problems, it is necessary to 
assemble many facts about prices, some of which are not readily 
accessible in published form and others of which are closely guarded 
secrets. The prices that are or may be made available to an impartial 
organization, however, are so complicated that some summarizing or 
classification is necessary. Two major principles of classification are, 
of course, those based on the date and the place. I suggest two other 
main, principles for such classification of wholesale prices, with the 
thought, not that all prices can be pigeon-holed exactly under the 
classes to be suggested, but that the process of attempting to assign 
particular prices to these classes will bring out their real nature even 
if it is sometimes found that a price falls to some extent into two 
parallel classes. For this purpose, I am defining a price, or perhaps I 
should say, a price determination, to be an agreement by a buyer and 
a seller as a basis for the transfer of some commodity or service, the 
circumstances under which the bargain is concluded being considered 
an essential part of the price determination. The two principles of 
classification are: (1) Degree and type of competition among sup- 
pliers and purchasers; (2) Stage in the industrial process from un- 
modified natural product to actual consumption goods or services. 

Taking up the principle of degree of competition, two extreme classes 
have been pretty thoroughly discussed by economists; namely, first, 
the free market, in which many suppliers of relatively small amounts 
offer to many purchasers of relatively small amounts completely 
standardized articles ready for immediate delivery, or to be ready at a 
definite future date; and second, the monopoly, such as that of a 
patented article or a copyrighted book, in which there is only one 








122 American Statistical Association 


supplier but many purchasers. The usual assumption seems to be 
that the aim of the monopolistic supplier is to secure maximum net 
returns. 

In addition to these two well-recognized classes, it seems appropriate 
to mark out several others, such as: (3) the made-to-order class, and 
(4) the repeated order class. 

Under (3), the article desired may differ from the standard article 
in such a way that special machinery or greater care is required. For 
instance, the paper used for telephone directories is required to meet 
more exacting specifications than ordinary newsprint. Obviously, in 
such cases, the negotiations between the supplier or suppliers and the 
purchaser are governed by different considerations than in the case of 
the free market. If the product resembles something handled in a 
free market, the price will obviously be somewhere near the price 
determined in the free market, but certain possibilities of the free 
market are practically impossible. For example, sacrifice sales far 
below the cost of production are unlikely, and conversely, prices giving 
an abnormally large return on investment are unusual, or at least 
temporary. 

Under (4), in order to secure quality and service of a certain desired 
character, the buyer may desire to cultivate long-term relationships 
with his suppliers. Under such circumstances, free discussion and 
mutual adaptation replace the higgling of the free market. Cost of 
production has a much more prominent place in price determination 
than in the free market. It is perhaps of interest to remark that de- 
termining cost of production to the satisfaction of both buyer and 
seller is a matter for considerable discussion in view of the varied 
methods which are possible in regard to such things as depreciation 
rates and the interest rate on investment as a part of costs. 

To those who have set their minds to the conditions of the ordinary 
market place, certain points in the attitude of an up-to-date purchasing 
agent might seem startling. For instance, a purchasing agent who 
might be requested to write into the proposed contract a price low 
enough to make the would-be seller unquestionably the low bidder, 
would probably refuse to accede to this request on the ground that 
any basis entailing serious and arbitrarily-set loss to the supplier would 
merely ruin him and prevent the establishment of the desired lasting 
relationship. 

Another thing that happens to prices under these conditions is that 
one supplier who is just entering the field may obtain temporarily, on a 
small experimental order, a higher price than others who are long 
established. The simple act of summarizing prices effective at a given 





Bo al A cael 











etd ttcete, od 


et eet 





in ah te it oe CUMlUcelUlC CON 



























Proceedings 123 


date therefore calls for judgment whether or not to include such a 
contract in finding the average. 

I have perhaps said enough to indicate how prices may be classified 
by degree and type of competition. The second principle of classifica- 
tion, which should perhaps be even more emphasized, is classification 
according to stage in the industrial process. 

In analyzing the factors bearing on price with a view to making 
estimates for the next six months or the ensuing year or for a long term, 
the first step in my experience is to find out or to recall to what extent 
the commodity in question falls in one or the other of these classes—by 
type of competition or by stage in industrial process. With no claim 
that the categories are either all-embracing or mutually exclusive, I 
submit the following for consideration: 

(a) Natural Resources. The chief characteristics of this class are 
that the commodity is standardized and that supply at a given time 
cannot be changed promptly to meet demand and that, therefore, 
price and cost of production may separate widely for considerable 
periods. Prices of commodities of this group are as a rule determined 
in an open market and are in most cases definitely a matter of published 
record. This category has therefore furnished most of the material 
acceptable to index number makers. 

This category breaks up into some five main sub-categories: (1) 
annual crop agricultural products; (2) long cycle agricultural products, 
like rubber, coffee and beef cattle; (3) products of mines; (4) products 
of forests; (5) products of the sea. 

(b) Agricultural or Mineral By-Products, such as hides. For this 
class, price is determined almost entirely by demand, with supply 
inelastic within ordinary ranges. 

(c) A class which I shall label ‘‘By-Purchases”’; i.e., articles such 
as nails and screws which constitute a minor element in the final manu- 
factured product. The price of this class is determined almost solely 
by supply, with demand inelastic within ordinary ranges. 

(d) Value Added by Fabrication. As the process of fabrication be- 
comes highly specialized and much of the end product is not completely 
standardized, this category is much more difficult to deal with from the 
point of view of the index number makers. As a matter of fact, its 
existence has been disguised by merger with category (a) in that total 
prices for the end product are used in index numbers rather than prices 
for value added by fabrication. For instance, many index numbers 
include prices of copper wire, brass rod and brass sheet in addition to 
the price of electrolytic copper. For purposes of economic analysis, 
it seems to me that the differentials should be studied rather than 








124 American Statistical Association 


these total prices, as in the case of the wire drawing differential, the 
price of copper wire per pound less the price of copper per pound. The 
independence of value added by fabrication is indicated by the fact 
that the published quotations for copper wire show that the wire draw- 
ing differential was 1.88 cents per pound on January 2, 1929, when 
copper was 16.75 cents per pound, and remained at that figure while 
copper went up to 24 cents and back to 18 cents. A slight decline 
from 1.88 to 1.75 cents occurred when copper broke to 14 cents, but 
the further decline to 9.50 cents brought no additional change in the 
differential. These facts are obscured if attention is concentrated on 
the quotation for copper wire. When an index number includes both 
copper wire and electrolytic copper, it, in effect, gives double weight to 
electrolytic copper and only a single weight to the fabrication dif- 
ferential. 

A preliminary analysis of fabricating differentials for brass rod and 
sheet and nickel silver shows some minor changes in the fabricating 
differential as the price of the basic raw materials changes. As a 
general statement in order to stimulate discussion, we may dogmatize 
as follows: Fabricating differentials have not changed much over the 
last two years except in cases where competition has been of a cut- 
throat character, but the official index numbers have failed to bring 
this out because the manufactured commodities sub-index has been 
really a composite, summarizing both the prices of the fabricating 
process and the prices of the raw materials. 

It may be suggested that careful analysis would probably show that 
the fluctuations of the copper wire drawing differential, of the differen- 
tial for preparing automobile steel sheets, and of the fabricating differ- 
ential for preparing cotton yarn from raw cotton are more like 
each other than they are like the price changes of the basic raw 
materials. 

May I suggest that this class, ‘‘ Vaiue Added by Fabrication,” seems 
one of great importance in which additional effort promises to be 
peculiarly profitable. There are two sub-classes of special significance, 
one including items like the wire drawing differential, in which the 
basic data are largely a matter of record requiring only to be brought 
together by an unbiased agency; and the other including important 
highly fabricated items like automobiles, for which published price 
indexes fail to present a picture of the true price change from year to 
year. With the codperation of engineers in the automobile industry, 
for instance, I believe a divorce of apparent price change due to changes 
of design and true price changes could be accomplished, as we believe 
we have accomplished similar separations in the telephone industry. 
































Proceedings 125 


(e) The Value Added by Distribution Category, including Value 
Added by Transportation. In comment on these classes, the prices 
of a large number of highly fabricated commodities seem to be in 
large part built up by elements of value of the preceding categories. 
This is especially true of producers’ goods bought in large quantities by 
a progressive Purchasing Department addicted to what used to be 
called ‘‘scientific purchasing,” and is now called ‘‘purchase by 
analysis.”’ ! 

(f) Services Category. An individual’s purchases include in addi- 
tion to commodities to which value has been added by processes of 
fabrication and distribution, services which may or may not involve 
commodities as a means to anend. Examples are telephone and elec- 
tric service, legal and medical services, transportation, theatrical or 
other amusements, restaurant meals less cost of food, and the like. In 
fact, a large part of the city dweller’s purchases tend to fall in this class. 

Some industrial purchases also fall in this class—prices of installation 
of equipment, if billed separately, and architectural and engineering 
fees. The line between services and the fabricating differential may 
become pretty thin—e.g., when a contractor creosotes poles furnished 
to him, or a printer prepares publications with paper furnished to him. 

(g) Rent Category. In view of the interval which may have elapsed 
between the time a building is constructed and the time its use is paid 
for by rents, the price fluctuations of rent can probably not be analyzed 
by combining elements of value previously mentioned. 

Enough has been given, perhaps, to illustrate what I mean by urging 
that many additional price index numbers outside the field of raw and 
semi-manufactured commodities should be developed to describe the 
price structure more adequately. These new index numbers should 
allow for the lag between date of purchase contract and date of delivery 
and for the different types of competition. They should cover sep- 
arately all stages of the industrial process from extraction of natural 
resources to distribution of consumption goods and rendering of 
services. 

It will turn out, I think, that the index numbers of the fabricating 
differential or of services will show less wide and less frequent fluctua- 
tions than the index numbers of the prices of natural resources. If in- 
dex numbers of all these types are available, it will be much easier to 


1 Albert L. Salt, President, Graybar Electric Company, Inc., ‘Pioneer Days of Purchasing,”’ Maga- 
zine of Business, January, 1928, pp. 30-32 ff. 

L. R. Watkins, Purchasing Agent, Western Electric Company, “The Analysis of Prices,’ American 
Metal Market, June 27, 1928, pp. 8-9. 

E. T. Gushee, Director of Purchases, Detroit Edison Company, “Costs for Price Making,’’ Manu- 
Jacturers News, June, 1930, pp. 11 ff. 








126 American Statistical Association 





obtain serviceable solutions to many business and economic problems, 
such as breaking down total price into its elements, removing the effect 
of price changes from series of sales or plant values, forecasting price 
changes, or perhaps even revising the methods now considered appro- 
priate for modifying the free play of price determining forces. 














Se lire, (lil ll l(iCi {hr rOrhhlc(iéiC ee)?’  F* .. rrr 


=e: Pn 


aan 


a. a on. ee ee ae ee 


— WE we ete ee 


Tn -_ 














Proceedings 127 


TESTS APPLIED TO AN INDEX OF THE PRICE LEVEL 
FOR INDUSTRIAL STOCKS! 


By EpGgar LAWRENCE SMITH 


The tests which are the subject of this discussion were devised to 
disclose how much reliance could be placed upon an Index of the Price 
Level for Industrial Common Stocks compiled from the point of view 
of investment values rather than of price changes. 

It is not an index of average quoted prices, but rather an index of 
changes in the value of an investment originally divided equally (in 
terms of dollars) among 20 industrial stocks at their average prices for 
January, 1901. Briefly stated, the changing value of this investment 
from month to month is calculated upon the mean between the high 
and low of the prices for the stocks involved each month. In December 
of each year, the total value of the holding thus obtained is again 
divided by 20 and reinvested in the same 20 stocks, thus equalizing the 
dollar values of the holdings each December. All split-ups, stock 
dividends, the value of all rights, and of all other forms of capital 
distribution, are retained in the dollar values. Only the cash dividends 
normally paid out of income are allowed to disappear. 

The basis upon which the stocks were selected is important. A 
search was made for as many different industries as could be clearly 
identified in the stock list of 1926, when the Index was first compiled. 
This brought to light 25 distinct industries. A leading company was 
then chosen to represent each of these industries. No distinction was 
made between industries which had successful or unsuccessful records. 
From this point of view, every type of industry is represented. In the 
earlier period, from 1901 to 1914, it was not possible to identify 25 
clearly distinguishable industries in the list of the New York Stock 
Exchange. So stocks representing 20 industries were used in this 
period. In 1915, 24 stocks were carried. 

The Index, then, is a record of the changes in dollar value that would 
have occurred in a fund invested equally in 20 stocks from 1901 to 1914 
inclusive, the dollar value of the investment in each stock being equal- 
ized annually. In 1915, the dollar values were divided among 24 
stocks, and among 25 stocks thereafter; each stock representing a 
leading company in a separate industry; the industries represented 
being all that could be clearly identified in the listings of the New York 
Stock Exchange in the respective periods. 


1 The calculations used in these indices were made by William P. Shea, of Irving Investors Manage- 
ment Company, Inc. 








128 American Statistical Association 


This, then, is the basis, in brief,! of the Index we are to examine and 
to test in relation to other more general data, in an effort to determine 
whether or not it accurately depicts changes in price level for industrial 
common stocks. 

The principal characteristic of this Index (see Chart I) is an upward 
trend represented by an unbroken straight line running through central 


CHART I 





AN INDEX OF THE PRICE LEVEL FOR INDUSTRIAL STOCKS 
SHOWING LONG TERM UPWARD TREND 











points in the markets of 1916-1917, 1918-1919, 1920-1921, and 1924- 
1925. These central points are enclosed in circles on the chart, for 
emphasis. The straight line on the ratio scale used, extended to the 
early years of the century, runs between six similar central points, 
three above and three below, all of them close to the line. 

The trend so recorded is at the high rate of 9.46 per cent compounded 
annually and we are able to add broken lines parallel to the trend line, 
as a method of comparing the levels recently attained with previous 
high and low phases of the Index, something that it is not possible to do 
in connection with many other indices of industrial stock prices, which 
fail to disclose any consistent trend over so long a period.? 

It is naturally of importance to discover whether an index of the 
stock price level disclosing such a trend is in fact valid. To this end 
a number of tests and comparisons have been made with interesting 
results. 


1 More fully described in Review of Economic Statistics, Harvard Economic Society, January, 1927. 
? The Standard Statistics Index of 337 Industrials parallels this Index, after the low of 1921. 








Proceedings 129 


The first of these comparisons is recorded on Chart II, showing the 
relation which existed between this Stock Index and certain bank 
reserve figures modified by interest rates. 

On this chart appears, in addition to the Index of the Price Level for 
Industrial Stocks, a broken line which, from 1901 to 1914, prior to the 
organization of the Federal Reserve System, was obtained by dividing 


CHART II 





AN INDEX OF THE PRICE LEVEL FOR INDUSTRIAL STOCKS 
COMPARED WITH BANK RESERVES + INTEREST RATES 





|(MON=TARY- GOLD) + (MONTHLY AV. COMM. PAPER RATE) 








[GOD + FR. EARNING ASSETS) + QONTHLY AV. COmm. PAPER RATE)| 














the stock of monetary gold in the country by the interest rate on 60-90 
day commercial paper. From 1914 to date, Federal Reserve Earning 
Assets were added to the stock of monetary gold before dividing by 
interest rates. Only the insertion of a decimal point was necessary to 
fit this line to the stock market up to 1924. 

As I said at the Chicago meeting in 1928 (when this relationship was 
first discussed), if this chart had been compiled to cover only the ten- 
year period ending in 1925, it might have appeared that an inviolable 
relationship existed between industrial stock prices, bank reserves and 
interest rates. For certainly between 1915 and 1925, in the earlier 
years of the Federal Reserve System, the relationship is marked. But 
shortly after 1924 it disappeared. The breaking down of the earlier 
relationship disclosed in this chart, which became increasingly apparent 
in 1927 and 1928, led to some doubt, for a time, as to whether this 
method of charting stock price movements furnished, after all, a reliable 
index for the market as a whole. 

But now that a number of years have elapsed, we can see more clearly 

















130 American Statistical Association 





what has occurred. New measures have therefore been introduced. 
And it is these new measures that I wish particularly to bring to your 
attention today. 

Before leaving Chart II, let us notice that the relationship between 
bank reserves, interest rates and the price level for industrial stocks, 
came to an end in the years 1924 and 1925, and that the disparity 
between them developed increasingly, until the violent fall of the 
market from September, 1929, to the end of 1930, accompanied by a 
rapid decline in commercial paper rates, brought them again into some 
proximity. 

It is apparent that something quite new must have affected the 
relationship between stock prices and underlying credit factors in 1924 
and 1925, and must have continued to affect them at least up to Septem- 
ber, 1929. 

Perhaps the most noticeable change which occurred during this 
particular period is to be found in the volume of trading on the New 
York Stock Exchange. This change began to develop just as the 
relationship disclosed in Chart II began to disappear. On the next 
chart (III) is recorded the monthly volume of trading on the New York 
Stock Exchange from 1901 to date. And here we see that, commencing 
with 1925 and continuing to the end of 1930, transactions are on a scale 
beyond all previous records for the century. 

The first signs of this increased interest in stock trading became 
evident in December, 1924, and even today stock trading has not 
receded to a level comparable to any period prior to 1924. 

This, then, was a new factor in the situation—volume of trading- 
velocity. 

This idea of velocity leads back, though not entirely on the orthodox 
path, toward the economic conception of price, in which not only 
changes in the volume of purchasing power, but changes in the velocity 
of purchasing power as well, contribute to changes in the general price 
level. 

Assuming, then, that the greater part of the purchasing power used 
in the making of stock prices was to be found in security loans, it 
seemed reasonable that the velocity with which this purchasing power 
was used in the purchase and sale of stock might be measured by di- 
viding the volume of trading on the Exchange by the tctal of secured 
loans. 

On the bottom of Chart IV will be found two lines. The dotted 
line (CR) represents the volume of credit released for trading purposes 
by loans on securities for all member banks (partly estimated), to which 
have been added “loans for the account of others” from 1926 on. We 



















Proceedings 131 


have, then, in this dotted line (CR) an index of changes in the amount 
of credit secured by the deposit of stocks and bonds, both in New 
York and throughout the country. 

The light unbroken line (VEL), also at, the bottom of the page, is an 
index of the velocity of this credit, obtained by dividing the volume of 


CHART III 








MONTHLY VOLUME OF TRADING 


MILLIONS NEW YORK STOCK EXCHANGE MILUIONS 
OF SHARES OF SHARES 





1901 1902 1903 1904 1906 1907 19086 





e 
ign 912 913 1914 1915 1916 1917 1918 1919 1920 





1927 1926 1929 





1924 





1925 

















132 American Statistical Association 


stock exchange transactions by the amount of these secured loans. 
The volume of transactions on the New York Stock Exchange, when 
used in this way, must be regarded as an index of the volume of trading 
in securities outside as well as inside the confines of this exchange. 

As a method of smoothing the curve, the 12-month moving average 


CHART IV 





AN INDEX OF THE PRICE LEVEL FOR INDUSTRIAL STOCKS 


COMPARED WITH CREDIT + VELOCITY OF CREDIT 


a ee ee a a ae | 






























































hes 


= 



























































' i925 i927 
COMPILED BY EDGAR LAWRENCE SMITH 























Proceedings 133 


of the velocity of collateral loans, recorded in the last month of each 
12-month period, has been used. 

Both of these lines at the bottom of the chart are index numbers, in 
which the data for November, 1920, the first month in which figures 
were available, has been taken as 100. So it is possible to combine 
them. The sum of their changes since 1920, representing changes both 
in the volume and in the velocity of collateral loans, appears in the 
light continuous line (CR + VEL) fitted to our Index of Industrial 
Stocks at the low point of the market in August, 1921. The rest of its 
course it has pursued without correction or change. 

The picture is striking, and seems to disclose two items of value: 
(1) For the period from 1921 to date, it suggests that changes in the 
general level of industrial stock prices correspond to changes in the 
sum of the amount and the velocity of loans against securities. (2) It 
contains further confirmation, for this latter period, of the validity of 
our Index of the Price Level for Industrial Stocks. 

The comparisons recorded on Charts II and IV go a long way toward 
supporting the belief that the method of appraising the price level for 
industrial stocks, employed in creating this Index, must be sound in 
principle. On no other premise would it be possible to account for the 
relationships disclosed in these two charts. 

At first it may seem incredible that any method applied to only 25 
out of the hundreds of stocks listed on the New York Stock Exchange, 
could produce an index that would stand up under these tests. But it 
is less difficult to understand when we realize that, as of December 1, 
1930, the total market value of the 25 issues represented in this Index 
amounted to more than 30 per cent of the total market value of all 
industrial and utility issues listed on the Exchange. 

But, if this Stock Index is correct in principle, and its movements 
closely related to the general credit situation, it should be possible to 
find some series of credit data available throughout the entire period 
from 1901 to date that would at least approximate the Stock Price 
Index for the whole period. And this has proved to be possible by 
employing again the idea that price involves not only the amount of 
credit, but also its velocity. 

In Chart V appears again the Index of the Price Level for Industrial 
Stocks, accompanied by a lighter line. This lighter line is derived 
from figures of the New York City Clearing House Banks. Following 
the procedure suggested by Chart IV, it would have been valuable if a 
loan figure had been available. But the Clearing House banks did not 
report their loans separately in the earlier years of the century; they 
reported them only in an item which included their investments as well. 





134 American Statistical Association 


CHART V 
AN INDEX OF THE PRICE LEVEL FOR INDUSTRIAL STOCKS 


COMPARED WITH N.Y. BANK DEPOSITS + CLEARINGS 








| | 
| | | 











COMPLE BY GAR LOwNENEE mer 





So no loan figure was available throughout the period. Consequently, 
we used a figure normally quite parallel to the loan figure, namely, net 
demand and time deposits. Continuity made it necessary to estimate 
even this figure from 1901 to 1910, in that, prior to 1911, figures for 
Trust Company members of the Clearing House were not included. 
In 1911, when they were included, Trust Company deposits represented 
30 per cent of the total reported, so earlier figures were all increased 
by 30 per cent. 

Thus net deposits of New York Clearing House Banks (partly 
estimated) provide a series broadly comparable to line CR in Chart IV, 
with fluctuations approximating changes in the volume of credit or 
purchasing power available for security transactions. 

For the velocity factor, New York City Bank Clearings are used 
without change. On account of the monthly variations of each of these 
items, and their seasonal behavior, the curve has been smoothed by 
using 2-months moving averages, corrected for seasonal variations. 

In the form of index numbers, these items have been combined 
through the addition of their respective changes since 1901 and fitted to 
the Stock Index, on the basis of the average of credit plus velocity being 
equal to the average for stock prices in the year 1901. No other ad- 
justments were made until the item of “Loans for the Account of 
Others” became available in 1926. ‘Loans for the Account of 
Others,” for the purposes of this study, were regarded as additional 
deposits, and added to the amount of credit, as represented by net 
deposits of New York Clearing House Banks. 











_— 


ee a ee eee ee ee ee eS 

















Proceedings 135 


The disparity between this credit plus velocity index on the one hand, 
and the Stock Index on the other, which lasted throughout the war 
period, is too easily understood to require discussion. 

That an index based primarily on the Net Deposits of New York 
Clearing House Banks, plus New York Bank Clearings, and fitted to our 
Index of Stock Prices in 1901, should automatically hold so close to the 
Stock Index through 1914; should then rise above it only during a period 
when war activity fully accounted for the disparity; and should again 
hold so closely to it right up to September, 1929, and to date, seems 
again to indicate that a correct method of charting price changes in a 
limited number of the larger industrial stocks, each chosen from a 
separate major industry, produces a reliable index of the general level 
of industrial stock prices. And this is what we set out to demonstrate. 

As a result of the tests, however, a few additional conclusions may be 
drawn, namely: 

1. That, measured in bank dollars, the rate of organic growth in an 
investment in a cross section of American industry has shown appre- 
ciation, during the past 30 years, at a rate approximating 9.46 per cent 
compounded annually, after dividends (Chart I). This does not imply 
that such a rate may be projected into the future. Yet a long term 
upward trend at varying rates since 1837 has been disclosed by earlier 
studies.! 

2. That stock prices habitually fluctuate above and below a central 
trend, and that, having established an approximation of this trend, 
it is possible, with some degree of accuracy, to compare current stock 
price levels with previous levels. Without establishing such a trend, 
comparisons of this sort are not valid (Chart I). 

3. That gold, up to 1914, and later gold plus Federal Reserve Earn- 
ing Assets (representing the trend in bank reserves), divided by interest 
rates, furnished an index which bore a discernible relation to stock 
prices prior to 1925—a relation, however, which ceased from 1925 to 
1930, though there is some indication that it may be resumed (Chart 
II). 

4. That stock market activity rose to unprecedented proportions, 
beginning in 1925, and reached a climax in 1929 (Chart IIT). 

5. That changes in velocity which this activity gave to secured loans, 
whether made by banks or for the account of others, added to changes 
in the volume of such loans, corresponded with changes in the price 
level for industrial stocks (Chart IV). 


1 Common Stocks as Long Term Investments, by the present writer, Macmillan Company, 1924. 




















136 American Statistical Association 


ANALYZING THE RELATIONSHIP BETWEEN STOCK 
PRICES, EARNINGS, AND DIVIDENDS 


By Wittrorp I. Kina 


Since the ratio of price to earnings has come to be considered such an 
important factor in determining whether or not a stock is a desirable 
purchase, and since the average of such ratios is believed by many to 
have significance in indicating the probable future drift of stock prices 
in general, a number of periodicals are not only publishing regularly 
sich ratios for individual stocks but are also presenting each month 
averages of these ratios for large groups of stocks. As it happens, 
however, the different publications very commonly disagree as to the 
ratio for a given stock, and also as to the average for stocks in general 
at a given date. 

These discrepancies arise from several distinct sources. Some peri- 

| )dicals calculate each ratio on the basis of the earnings of the company 
or its last fiscal year. This procedure means that in December, 1930, 
ratios are still being computed on the basis of 1929 earnings. Since 
these earnings were, in some cases, several times as large as those for 
the most recent quarter, the result is to indicate extremely low price- 
earnings ratios for many stocks, even though the ratios to present 
earnings may be relatively high. 

(a) Other publications, in an effort to give greater reality to the results, 
calculate all ratios on the basis of the earnings in the last quarter for 
which reports are available. The adoption of this procedure yields 
high ratios for companies keeping their reports well up-to-date and low 
ratios for the laggards. 

A third procedure is to base the computed ratio upon estimated 

( 3 earnings for the current year. If earnings in general are declining, the 

earnings for the entire year will be relatively larger than the earnings 

for the most recent quarter, hence this third method will, in the 

majority of cases, obviously give ratios too low to be indicative of 
current conditions. 

The fourth procedure, which is perhaps the best from the standpoint 

(A) of logic, is to compute all ratios on the basis of estimated earnings for 
the last quarter. In practice it is difficult to apply this method, for 

many concerns either publish no quarterly reports or delay so long 

before giving out the necessary information that the issue of the 
periodical in which the results should appear is off the press long before 
the earnings figures are available. In every such instance, the absence 























Proceedings 137 


of actual data makes it necessary to use estimates of earnings, and, in 
too many cases, the estimates must be based upon such slender in- 
formation that they can be considered little better than blind guesses. 

Even when complete corporation reports are available, one can 
never be sure that the earnings there set forth are real, for an intelligent 
executive assisted by a skilled accountant can, at his discretion, cause 
reported earnings either to shrink or expand to a remarkable degree. 
From what has just been said, it follows that the best that the statis- 
tician can hope to do is to secure satisfactory recent earnings data for 
only a fraction of the companies listed on the exchange. 

Even when thé statistician—possesses accurate current data for a 
large number of stocks, there are still obstacles in the way of bringing 
out the true relationships existing between prices, dividends, and 
earnings. Experience indicates that, in general, if the earnings appli- 
cabie to two stocks are identical, being, say, $8 per share in each case, 
but if one company is paying $4 in dividends and the other one nothing, 
the stock paying dividends will command the higher price on the 
market. Apparently there are two reasons for this difference: 





1, The payment of a dividend places in the hands of the stockholder 
money which he can spend and furnishes an ocular indication 
that he is receiving income on his investment. 

2. The investor feels that the payment of dividends gives assurance 
that the earnings reported by the company are genuine and 
not merely fictitious. That, in the great majority of cases, the 
investor is correct in his inference, is indicated by such evidence 
as is available. 


Since the market values more highly a dollar in dividends than it 
does a dollar of reported earnings, any adequate analysis must obvi- 
ously differeatiate between earnings uséd to pay dividends and earn- 
ings carried to surplus. The latter quantity will hereafter be referred 
to as ‘surplus earnings.”” Comparison of different companies in this 
respect is hampered by the fact that different stocks have different 
dividend rates and also different fractions of earnings carried to 
surplus. The first step in simplifying the problem is to eliminate the 
first-mentioned variable. This is readily accomplished by using the 
following procedure: For each stock, divide the current annual divi- 
dend, the surplus earnings per share, and the price per share by a 
common divisor—namely, the number of dollars in the existing rate of 
dividend paid on that stock. The effect of so doing is illustrated by 
the following example. Let us suppose that General Electric stock 
is earning $2.40 per share annually, paying $1.60 dividends, and selling 














138 American Statistical Association 





for $48. Since the dividend rate is $1.60, we imagine a stock dividend 
of 60 per cent. This is equivalent to dividing by 1.6 all of the three 
quantities mentioned. We then have a hypothetical share paying 


$1.60 or $1.00 dividend, earning $2.40 or $1.50, and valued at - 








or $30. All other stocks are treated similarly. The differential in 
dividend rates having thus been eliminated, it is possible to proceed 
to a direct comparison of stock prices with surplus earnings per share. 

This comparison may, of course, be made on the basis of averages. 
When this procedure is decided upon, the statistician is, obviously, 
confronted with the problem of what type of average to employ and 
what weights to use. Ordinarily, his decision on these points may be 
of little consequence, but, when a situation like that existing last 
spring prevails, the question of weighting may determine whether the 
average arrived at is high or low. This was true because, at that time, 
the “‘blue chips,” representing for the most part huge concerns, were 
far out of line as compared to the stock list in general. 

At best, averages cannot avoid obscuring details. A graphic method 
of comparing stock prices with dividends and surplus earnings has, 
therefore, much to commend it. Perhaps the simplest form is a scatter 
diagram like that illustrated in the accompanying chart. Each small 
circle represents a single stock, the adjusted earnings thereof being 
plotted horizontally and the adjusted price vertically. The distribu- 
tion of the circles themselves shows in a broad way the relationship 
existing between prices and earnings in general. The diagram can, 
however, be made more intelligible by applying to it an averaging 
process. 

The mathematically inclined statistician may begin by fitting a 
straight line or a curve to the vertical readings of the points, using 
perhaps the method of least squares. He may next, by a similar 
method, fit another line or curve to the horizontal readings of the 
points. My personal feeling is that one can secure as useful results 
with much less effort by first ascertaining the medians of vertical 
columns of data and next ascertaining the medians of the horizontal 
zones of data. On the chart the vertical medians have been indicated 
by squares and the horizontal ones by triangles. It will be observed 
that the widths of the columns and zones are far from uniform. They 
have been adjusted to accord roughly with the relative densities of the 
items in the various sections of the diagram. 

As a rule, the squares or triangles are inserted at the centers of the 
columns or zones to which they apply. When, however, the items in 
a column or zone are concentrated more largely on one side than on 

















Proceedings 
















CHART I 


LINE oF RELATIONSHIP BETWEEN PRICE anpEARNINGS 
DIVIDERD-PAYING INDUSTRIAL COMMON STOCKS’: 


( ADJUSTED TO $1. DIVIDEND) 
NOVEMBER 1, 1930 





PRICE ADJUSTED TO #1. DIVIDEND 


A , 


‘ 





O 050 100 150 200 250 300 350 400 450 S00 550 600 
EARNINGS ADJUSTED TO ¢1 DIVIDEND 


* Mr. John B. Hillman is to be credited with most of the statistical and graphic work in this study. 


the other, the square or triangle is moved in the direction of con- 
centration. 

The curve AB has been fitted free hand to the line of the squares. 
It represents the price at which a stock having a $1 dividend rate and 
the specified amount of earnings will probably sell, when not only 
earnings but all other factors affecting its value are taken into con- 
sideration. 

Similarly, the curve RC has been fitted free hand to the triangles. 
It represents the earnings which might be expected on a stock having 








Mae 


— 


‘ 
4 
: 
"4 





140 American Statistical Association 


a $1 dividend and selling for the given price, all other forces being 
taken into consideration. 

Now if all records were perfect and if stock prices were dependent 
solely upon earnings and dividends, the small circles would have no 
scatter but would all lie along a single curve located somewhere 
between AB and CR. This hypothetical curve may be designated as 
the “‘line of relationship.” Records of earnings are, in fact, very 
imperfect. Stock prices are influenced largely by prospective as well 
as by past earnings, by forced sales, by the cash position of the com- 
pany, by gambling and speculation based upon rumors or “hunches,” 
and by scores of other forces more or less potent. As a result, the 
circles do not cluster closely along a curve but instead are scattered 
about in such a manner that a correlation surface representing the 
distribution would resemble a wedge-shaped ridge, high at the south- 
west end, with rather precipitous sides at that point and running down 
on a broad gentle slope toward the northeast. The line SK has been 
inserted bisecting the angle between AB and RC. It represents an 
attempt to approximate the location of the crest of the ridge, in other 
words the ‘“‘line of relationship.” In so far as this attempt has been 
successful, the line SK shows what the relationship between stock 
prices and earnings would be were all other forces affecting stock 
prices non-existent. 

At the intersection of the curve SK with the ordinate representing 
per share earnings of $1—the exact amount required to pay the divi- 
dend at the existing rate—the surplus earnings are clearly zero. The 
vertical scale reading at that point on the line SK represents, there- 
fore, the value which the market places upon a dollar of dividends, 
when earnings are barely sufficient to cover dividends. On November 
1, 1930, this valuation was $9.60. 

The upper horizontal scale line has been inserted to facilitate reading 
off the additional values given by the market to varying amounts of 
surplus earnings per adjusted share of stock. We see that, at the date 
mentioned, the stock having $1.00 surplus earnings, if valued solely on 
the basis of earnings, would be worth $19.40. By subtracting the $9.60 
value arising from the dividend we arrive at $9.80 as the value at- 
tached to the first dollar of surplus earnings. Evidently this dollar 
is valued no less than the one necessary to pay dividends, probably 
because its existence is considered essential to guarantee the safety of 
the dividend. 

Additional dollars of surplus earnings are, however, valued less than 
the first dollar, if we are to trust the indications given by curve SK. 
The way in which the value declines is shown by the following figures: 











—_ a. ae eh hlUcl(i CO 


_ -., - ae an, 








Proceedings 141 





Valuation placed upon 
Number of dollar successive dollars of 
surplus earnings 

Di dnetdiii eh ealbaehieeeseaeebaenbeteebeneeket $9 .60 
oan a6 a iegedantdissees veuneoeneehe 9.80 
i os on ac pat hbeienabebnenenedee 7.7 

ee ae 6.60 
iss shen cceeseseaseuéenwesendaes 6.30 


It should, of course, be understood that the above figures represent 
nothing but a very rough approximation, for the scatter of the items 
in the upper right-hand section of the field is so great that the location 
of the SK line can at best be determined with nothing more than a 
) moderate approach to its true position. 

This method of determining the relationship between prices, divi- 
dends, and earnings can indeed rarely be depended upon to give any- 
thing more than approximate results. Since it lacks the appearance 
of accuracy given by an average, it will not appeal to the analyst who 
demands precision in his findings. Its advantage lies mainly in the 
fact that it enables the statistician to see how things really stand and 
therefore tends to prevent him from being deceived by the fictitious 
accuracy of his findings. 

It may be well to note for the benefit of those not familiar with the 
behavior of regression lines that a broad angle between lines AB and 
RC indicates a wide scatter of points, poor correlation between prices 
and earnings, and a large probable error in the location of SK, the line 
of relationship. ‘The reverse conclusion may, of course, be drawn 
when the angle between AB and RC is small. 


























American Statistical Association 


PROGRESS OF BANKING STATISTICS 


By E. A. GoLDENWEISER 


Banking statistics have increased enormously in volume in recent 
years and I believe that they have improved in quality—not corre- 
spondingly, but still to a creditable extent. The improvement has been 
in increasing frequency and promptness of publication, so that figures 
for some groups of banks become available weekly on the day following 
the condition report; the figures have also become more accurate, but 
the principal improvement has been in more adequate analysis and 
classification of the returns, so that they now give a better picture of 
what is happening and where, than was obtainable on the basis of 
aggregates for the nation as a whole. 

The development of these bigger and better banking statistics has 
made them more popular with writers and has subjected them to a great 
deal of study and interpretation, some of which has been good and has 
contributed to a better understanding of the course of business and 
credit; some of which has not been so good. Interpretations of the 
latter type have been due not to defects in the figures, but larzely to 
limitations of the figurers, who have stretched the figures bey» *-1 the 
limits of their carrying capacity to the point where they have !» ome 
too thin to support the heavy traffic to which they are subjected by 
their sponsors. 

One of the favorite abuses of banking figures takes the form of estab- 
lishing a long series showing the growth of bank credit from year to 
year, then averaging the rate of increase and thus arriving at an average 
or normal growth of credit in this country. This average is then set up 
as a measuring rod on which the financial authorities should base credit 
policy, and woe to them if they do not! If the growth for a given 
period is more than this average, then we are told that measures of 
restraint must be used, and if it is less than “normal,” then methods of 
encouragement are said to be imperative. If the authorities do not 
follow these rules—then they are held responsible for booms and 
depressions. The trouble with this formula, as with many others, is 
that it is applicable in “‘normal”’ times, while the characteristic of the 
normal, as also of the average, is that it does not exist. No times are 
average times or normal times. If you look on the variations in the 
percentages of change of bank credit from year to year since 1895, you 

will find that they vary from plus 16 to minus 5. Is there any signifi- 
cance in an average derived from such figures? In retrospect there 






























Proceedings 143 


may be. When you can say that in a given year credit grew by less 
than the average amount, but this is explained by such and such cir- 
cumstances, your statement may be illuminating, and as an approach 
to interpretation of past events the average may be useful. But it is 
too crude, too little tested and subject to too many accidental and 
unforeseeable variations to be trustworthy as a guide to current credit 
policy. And by the time cognizance has been taken of unusual factors 
in a given situation, there is nothing left for the average to do, but to 
retire. Take, for instance, 1929. Bank credit was not growing; policy 
indicated—monetary ease. But, as every one knows, it was a year of 
extraordinary expansion of credit and of speculative activity. Credit 
policy of necessity was one of restraint. There are explanations: loans 
were being made in enormous amounts by non-banking lenders. Ade- 
quate; in retrospect it is clear. But what is the object of setting up a 
mechanical formula, when for practical purposes many factors not 
included have to be considered? Better consider all the factors avail- 
able and measurable and in addition pray for guidance in their proper 
evaluation. Use averages by all means—but don’t abuse them and 
don’t shift to them the responsibilities which should rest on the exercise 
of the best human judgment. 

Another example of banking figures on which an excessive load is 
sometimes piled are bank debits—or bank clearings. These, too, are 
useful figures. They make it possible to measure turnover or velocity 
of circulation of bank deposits, a factor in the equation of exchange that 
is often not taken sufficiently into consideration in discussing the rela- 
tionship between business and bank credit. But bank debits as the 
sole measure of the volume of trade are inadequate and often mis- 
leading. Debits arise from security sales, from real estate transactions, 
from dividend payments, from fiscal operations, as well as from com- 
mercial transactions. They do not and cannot be accurately represent- 
ative of the volume of trade and industry. This was illustrated by the 
fact that at first national totals were used; then New York City debits, 
which were found to be too much under the influence of security trans- 
actions, were eliminated; later all large cities were eliminated, and only 
debits in small towns were taken and were still supposed to be a measure 
of the entire country’s trade. When the volume of debits is corrected, 
as it is called, by an index of prices—wholesale or retail or synthetic— 
then in the best case it corresponds to physical volume of trade, which 
can be better derived by other methods, and in the worst case it does not 
correspond to physical trade, so that you must then seek explanations 
of the discrepancy, adjust and manipulate. That may be a useful 
discipline and may even lead to useful discoveries of long-time relation- 








a Spade al 


CO TE Ni Lectin Pn 


| 
| 


H 
} 
: 
: 
- 
: 
: 


Me CE Nat ite 





144 American Statistical Association 


ships. But where does it leave debits as a direct index to be currently 
used as a measure of the volume of trade? It leaves them in the 
discard. 

Still another figure, this time one that is derivative, may be men- 
tioned as one on which too much stress is often laid. And that is the 
ratio of loans or loans and investments to deposits. It is commonly 
assumed that changes in the ratio are significant either in throwing 
light upon changing conditions in the money market or on the degree 
of pressure on the banks, and even that they afford an index of the 
soundness of the banking situation. The change from 77 to 75 per 
cent in the loan-deposit ratio during the fiscal year 1929-1930, for 
example, indicates that the banking situation has become easier, 
causing lower money rates, and that banks are under less pressure for 
funds than they were a year ago, a conclusion that might well be 
reached with less trouble by looking at the course of money rates. The 
truth is that the ratio figures can properly be used only with consider- 
able adjustment and refinement, and that even after the necessary re- 
finements they seldom tell any story that is not told more directly, 
simply and positively by the statistics that have to be used in making 
the necessary adjustments. 

For purposes of illustration take the decrease from 77 to 75 per cent 
in the ratio of loans to deposits during the past year. In interpreting 
these figures, certain questions to be answered are as follows: 

1. Does the change arise from an increase in the proportion of de- 
mand deposits as compared with time deposits? 

For a growth in demand deposits, as compared with a growth in time 
deposits, involves the tying up of more than half again as much money 
in reserves, i.e., in assets other than loans or investments. 

2. Does it arise from an increase in the proportion of the total busi- 
ness that is being done by city banks as compared with country banks? 

If so, the change may reflect nothing except the fact that city banks 
are required by law to maintain larger reserves than country banks and 
consequently have a smaller proportion of their deposits left to invest. 

3. Does it arise from a decrease in the amount of borrowing at the 
Reserve Banks? And if so, is this decrease due to purchases by the 
Reserve Banks of Government securities in the open market, or to gold 
imports, or to a return to the banks of currency from public circulation? 

If the decrease in the ratio arises from any of these sources, it can be 
accurately measured by the available figures of member bank borrow- 
ings at the reserve banks, monetary gold stock and money in circula- 
tion, all of which are in themselves more significant than the ratio that 
they are mobilized to interpret. In fact, there have been times when 


























Proceedings 145 


for a prolonged period changes in the loan-deposit ratio have reflected 
principally changes in the amount of money in circulation, and at such 
times the curve of money in circulation has accordingly told directly 
the same story that is told indirectly and less pointedly by the ratio 
curve. 

4. Does it arise from an increase in the capital invested in the banks 
by their own stockholders, i.e., in the capital, surplus and undivided 
profits of the banks? 

A growth in these accounts, other things equal, affects the ratio by 
providing funds to the banks, not represented by deposits. 

By the time that all of the corrections, modifications and explana- 
tions have been made the ratio adds nothing to our understanding of the 
situation. If, however, the ratio is used baldly, without qualifications, 
it can be misleading by ascribing to changes in fundamental conditions 
developments that may be largely fortuitous. The loan-to-deposit 
ratio may have been useful before we had direct measures of money 
market factors, but at the present time it is largely obsolete. 

Still another figure, or ratio, that is often used to indicate what by its 
nature it is not qualified to show is the ratio of our national gold reserves 
to the combined total of bank deposits and currency in the hands of the 
public. This ratio for the United States is 7 or 8 per cent, compared 
with much higher ratios for other countries. Therefore, it is argued in 
the face of our large excess reserves, that we have no gold to spare. It 
would seem more reasonable to interpret the figures as indicating that 
under our system of doing business largely by check a smaller ratio of 
gold to means of payment is necessary than under a system of currency 
payments. Is there any evidence that there is lack of confidence in our 
currency? Not at all. Is there anything holy in any given ratio so 
long as a lower one is adequate to maintain confidence? Is it not, on 
the contrary, very hopeful in view of the possible future shortage of 
gold, that this country has demonstrated the possibility of doing 
business on a large scale with less gold in proportion to liabilities? 
This is a case where figures are used to draw a conclusion to which they 
do not point, while ignoring an important situation that is revealed by 
them. . 

Banking figures have improved, methods of analyzing them have 
also improved, but I think that much further useful work can be done 
in analyzing analyses, and in determining which are sound and useful 
and which are misleading or superfluous. Such a review of analyses 
also throws a light, not only on the analyzers, but also on the problem 
of what series of banking figures should be maintained and developed 
further and what series should be curtailed or abandoned. We must 

















146 American Statistical Association 


endeavor always to put statistics in their place, to remain their masters 
rather than to become their slaves. 

Let me suggest some of the analyses that have proved to be signifi- 
cant. Dr. Burgess will show you how money market conditions can 
best be analyzed with reference to certain Federal Reserve figures: 
gold, currency, member bank reserves and their indebtedness at the 
Reserve Banks. May I suggest that this latter figure is the best indi- 
cator of conditions in different regions? When member banks are in 
debt they feel the pressure, and that is true locally as well as nationally. 
Other items in the Federal Reserve Bank statement, on the other hand, 
may reflect a policy of paying out one or another kind of currency, or 
the extent of participation of a local reserve bank in the system’s invest- 
ment account, or what not. Careful studies under way in the Reserve 
System are also yielding material on movements of funds between dis- 
tricts and on the local demand for currency in the different districts. 

An analysis of banking figures that has not received much attention 
and yet is more significant in interpreting conditions than any others is 
the analysis of the assets of banks into loans to customers and open- 
market loans and investments. For instance, member banks during 
the year ending last September showed a decrease in their combined 
loans and investments of about $500,000,000. Here was a year, there- 
fore, when member banks not only failed to show any growth in ac- 
cordance with the supposititious secular trend of business requirements, 
but actually contracted their credit by $500,000,000. This figure 
taken by itself may suggest that the banks were short of funds and that 
there was for that reason difficulty in finding means of financing the 
requirements of tradeand industry. An analysis of these figures, how- 
ever, brings out an entirely different picture. Bank operations consist 
of two fairly distinct parts: one is supplying the local demands of 
customers who need bank credit for the current operations of their 
business, and the other, which is generally undertaken after the local 
demands have been satisfied, is the placing of any surplus funds in in- 
vestments or in open-market loans. Figures for the year under dis- 
cussion show that the banks diminished their loans to customers by 
about $2,500,000,000, but increased their open-market portfolios, in- 
cluding both loans and investments, by about $2,000,000,000. This 
figure indicates that the banks had an abundance of funds and were 
trying to find ways of using them safely and profitably. In doing that 
they purchased about $1,000,000,000 of open-market loans, including 
acceptances, commercial paper and street loans, and also purchased 
another $1,000,000,000 of securities. Notwithstanding this increase of 
$2,000,000,000 in their open-market loans, however, their total loans 


























Proceedings 147 


and investments show a decrease of $500,000,000 because the local de- 
mand for funds from their customers diminished by $2,500,000,000. 
Most of that decrease was in unsecured loans, which come the closest 
to measuring loans by local trade and industry. These loans were 
contracted by more than $2,000,000,000. The figures as thus analyzed 
indicate that business depression resulted in diminished demands on 
banks for funds and that banks found themselves with an increasing 
amount of funds available for placing outside. It may be that some 
banks were unusually cautious in granting loans because they had little 
confidence in the existing business situation and, therefore, preferred 
open-market paper, but the general inference from the figures is that 
slackened trade resulted in reduced demand for credit, and that the 
abundance of bank funds resulted in the banks seeking employment for 
these funds in open-market loans and in investments. It is this 
abundance of funds that has resulted in a prevailing level of money 
rates as low as at any time in the history of the country. 

Further analysis of these figures by regions brings out other signifi- 
cant points, but all I wish to point out at this time is that, in general, 
we have found that intensive study of existing material, sub-dividing 
and analyzing totals, has yielded more illuminating results than gener- 
alizations and attempts at reducing the complexity of our economic 
life to the smallest possible number of indicators, preferably one single 
King-indicator. We have reached the conclusion that rather than 
search for the philosopher’s stone that would convert all statistics, 
even statistical dross, into pure gold, it is more profitable to search 
carefully in the mass of figures that reach us for those that will yield 
the pure gold of illuminating interpretation. 








Peete 








2 hd et echoes bees en eet fe 





American Statistical Association 


PROGRESS OF MONEY MARKET STATISTICS 
By W. Ranpo.pH Burgess 


For many years statistics of most of the essential phases of the money 
market have been compiled and published regularly and promptly. 
Scattered quotations of commercial paper rates are available for as 
early as 1831, and call money and commercial paper rates have been 
quoted weekly since the inauguration of the Commercial and Financial 
Chronicle in 1865. Certain of the items of condition of the large New 
York City banks were published weekly in the Clearing House statement 
beginning in 1853, when that association was organized. Gold exports 
and imports and money in circulation have been reported monthly by 
the Treasury Department since 1878. Four or five times a year ever 
since 1863 the Comptroller of the Currency has collected and published 
the statement of condition of the National Banks, and once a year 
since about 1873, the important facts for all other banks. These 
essential money market figures have thus been readily available with 
some others in less comprehensive or continuous form. The Chronicle 
long collected figures for shipments of currency between New York and 
the interior and from time to time some agency undertook to report the 
rates banks charged their customers. Bank clearings data go back to 
1853. Thus there has been for many years a substantial body of 
statistics relating to the money market. These statistics make it 
possible to an extraordinary extent to reconstruct the past, and give a 
basis of comparison in monetary statistics such as is available in few 
other fields. While over a term of years there had been continuous 
improvement in the amount and quality of money market statistics, 
the coming of the Federal Reserve System was a milestone in statistical 
progress in this field. 

Developments in basic money market statistics since the establish- 
ment of the Federal Reserve System have included both a considerable 
enlargement of the scope and precision of statistics and greater fre- 
quency in reporting. 

Under the first heading has come a vast extension of the statistics of 
banks. Instead of a scanty weekly report for New York City banks 
alone we now have weekly a fairly comprehensive report for about 40 
per cent of the country’s bank resources, including among the new 
items the much-discussed brokers loans. 

The report of bank clearings has been transformed into the more 
complete report of all bank debits and a wider area covered. A new 























































Proceedings 149 


report shows the amount of commercial paper outstanding. Data for 
bank loan rates to customers have been greatly extended and improved. 
A careful annual report by the Department of Commerce traces the 
changes in the important elements in the international balance of pay- 
ments, including the movements of funds. 

As to the greater frequency of reports, perhaps most progress has 
been made in banking statistics, with reasonably comprehensive weekly 
reports. Money in circulation is reported weekly, as well as monthly. 
Gold exports and imports and earmarkings are reported daily. 

But for the student of the money market the most interesting addi- 
tion to the picture is the statistics of the new Federal Reserve System, 
whose 36 items reveal to an extraordinary degree the tendencies in the 
financial situation. 

These are all considerable changes in the mass of available data, and 
the tendency has been to complicate greatly the picture which presents 
itself to the student. The Federal Reserve statement itself is a rather 
bewildering affair. With it all, in fact, question may well be raised 
whether the understanding of the current position of the money market 
is not now more difficult than before, when the machinery and the 
statistics were more scanty and more simple. 

An illustration of former simplicity and present complexity of state- 
ment is provided by the old New York City Clearing House statement 
compared with the present Federal Reserve statement. It may per- 
haps seem strange to bring these two statements into comparison, but 
their functions are indeed similar; both report the position of bank 
reserves. They are the two weekly statements upon which the student 
of the money market has had perforce to concentrate his attention. 

The New York Clearing House statement from August 6, 1853, gave 
for all the Clearing House banks the average reserves for the week and 
the average deposits, from which the reserve requirements were com- 
puted. Beginning February 8, 1908, figures on the actual position of 
the banks at the close of the week were added. The Clearing House 
statement revealed the surplus or deficit of bank reserves. 

The significance of this statement lay first in the fact that the New 
York City banks were the principal depository of the country’s basic 
bank reserves. Banks elsewhere were allowed by law to keep a large 
part of their reserves in New York City. And since the money market 
was in New York, idle funds also naturally gravitated there. The New 
York reserve position thus represented the reserve position for the 
country as a whole. When New York bank reserves were in large 
surplus, it was an indication that the country’s banking position was 
secure, and vice versa, when New York reserves were in deficit it 











150 American Statistical Association 


implied a shortage throughout the country. Thus the simple state- 
ment of surplus or shortage of reserves showed at a glance the country’s 
banking position. There was an extraordinarily close correlation 
between this reserve position and money rates. The Pearsonian co- 
efficient of the logarithms of the data for the years 1904 to 1909 is .76. 
Deficits presaged trouble and surpluses were a sure indication of easy 
money. These simple figures were thus an excellent index of money 
conditions. 

With the coming of the Reserve System the old Clearing House 
statement went into the discard as an index of the money position. 
For in later years the statement showed average bank reserves always 
just a shade over the average requirement, while actual reserves closed 
the week in surplus or deficit as chance might dictate without causing 
any alarm or elation except on the part of a few old-fashioned com- 
mentators who never became reconciled to the changed fortunes of the 
statement. 

In place of the relatively simple Clearing House statement we now 
have a complicated Federal Reserve statement containing 36 items 
with a few additional addenda and footnotes. When this change first 
took place it was bewildering to most people and still is to many. It 
was natural to seek in the new statement the nearest possible substitute 
for the old Clearing House statement, and that appeared to be the 
reserve ratio—the ratio of total reserves of the twelve Reserve Banks 
to their deposit and note liability. This ratio appeared to be a logical 
successor to the old bank reserve statement; it showed for the Reserve 
System approximately how much margin it had above legal require- 
ments, just what the Clearing House statement had shown for the New 
York Clearing House banks, which had been the nearest approach to 
reserve banks for this country. 

But for a number of reasons this figure did not prove significant. 
The primary trouble was that, since 1922 at least, the Reserve System 
has had such large gold reserves that its ratio never comes anywhere 
near the legal minimum; in fact the margin of safety has been so great 
that fluctuations in the ratio have had no great importance. They 
have had, for example, little relation with money rates. There may 
come a time when the reserve ratio is restored to importance by reason 
of such huge gold losses or such large increases in liabilities as would 
bring the ratio nearer to the legal minimum, or to what Bagehot terms 

the “apprehension point’; so that fluctuations in the ratio would 
again be watched closely. But that time now seems distant. 

In the meantime the search has gone on to find in the complicated 
Federal Reserve statement some substitute for the old simple Clearing 
























Proceedings 151 





House figure as an index of the position of the money market. Various 
elaborate formulae have been tried but none has been sufficiently con- 
vincing to gain many disciples. 

The suggestion is here offered that the best single figure by which the 
status of the money market may be measured, and one comparable in 
value to the Clearing House statement of reserves, is the single figure of 
total bills discounted. If we approach the problem objectively we find 
a close correlation between the bills discounted item and interest rates. 
For the years 1924 through 1929 the average monthly data for bills 
discounted and time money rates showed a Pearsonian coefficient of .87. 
If bills discounted at New York City banks alone are used the co- 
efficient is .70. Here then is a relationship comparable in closeness 
with the relation of money rates and the old Clearing House figure for 
reserve surplus or deficit. The reason is that bills discounted and 
surplus or shortage of reserves are really two measures of much the 
same thing. The situation may be put crudely by saying that in 
recent years the banks have almost always had a deficit in reserves, 
which they have made up by borrowing at the Federal Reserve Banks. 
The amount of the borrowing is thus simply the amount of the deficit 
in reserves. There is, of course, the important difference that deficits 
of this sort are expected as a normal part of the operation of the Federal 
Reserve mechanism, and the quantitative relation between a deficit of 
this character and money rates is wholly different from that character- 
istic of pre-Federal Reserve days. The effect of a $190,000,000 fluctua- 
tion in bills discounted is less than that of a $10,000,000 change in 
reserves before. In other words we have achieved a greater elasticity 
in our banking system, or to put it another way, sensitivity has been 
reduced. To the philosophically minded, the question may arise 
whether the banking system has not indeed become too insensitive. 
But that is a question rather beyond the scope of the present paper. 

While it might fairly be contended that the amplitude of statistics of 
recent years has given us no better index of the current changes in 
money conditions than was available in the old Clearing House state- 
ment, the case is far different when the discussion turns to an explana- 
tion of money market changes. The coming of the Reserve System 
removed at one stroke most of the obstacles to an analysis of the causes 
of money market changes. There had been in years past, it is true, 
many ingenious money market studies. One of the most ingenious and 
convincing was that of Professor David Kinley (later President of the 
University of Illinois). 

This study appears in Dr. Kinley’s work on the Independent Treasury 
of the United States, published as a section of the report of the National 








rt Ee ee | 


6 barren a Folin tae 3 


oan cnttineedio ee 


itn edhe ended ticame hee 


152 American Statistical Association 


Monetary Commission, in which he showed among other things the 
influence of ordinary operations of the independent Treasury system on 
the money market. Taking the reserves of the Clearing House banks 
as a measure of the amount of funds at the disposal of the money mar- 
ket, Dr. Kinley made use of two sets of figures published by the Cum- 
mercial and Financial Chronicle. The first figures showed that: part of 
the change in reserves attributable to the actions of the Subtreasury, 


and the second figures showed the movement of currency to and from 


the interior. The sum of these two items went far to explain the week- 
to-week change in Clearing House bank reserves. 

An important omission from this study, due to unavailability of rn 
was the net payment or receipt of currency into or from circulation, and 
data for gold movements were also unavailable currently. Another 
difficulty arose from the character of the report of bank reserves. For 
most of the period under consideration only the average reserves of the 
Associated banks were reported and not their actual reserves at the 
close of the week. These deficiencies in the basic data made it im- 
possible to appiy more broadly Dr. Kinley’s most interesting method 
of study. 

If a statistician were to set about devising a banking system which 
would give him as a by-product of its operations comprehensive figures 
as to the causes operating upon the money market, which might apply 
broadly and accurately Dr. Kinley’s method of approach, he would 
find difficulty in devising a scheme better designed to accomplish this 
purpose than the Federal Reserve System. For the Reserve Banks 
have drawn together within themselves those banking functions which 
most closely affect the market. The flow of currency into and out of 
circulation is through the Reserve Banks; gold movements are largely 
through them; transfers to and from other localities are centralized 
there; and changes in the demands for bank credit are reported regu- 
larly to the Reserve Banks as bank reserve requirements change. The 
basic bank reserves of the money market banks are deposited in the 
Reserve Bank. Every important money market change effects some 
change in those reserves and every such change is recorded in a Federal 
Reserve operation. 

Under these circumstances it should not be necessary to rely so much 
upon inference and guesswork as in the past in understanding the 
operation of cause and effect in the money market. For some years 
the Federal Reserve Bank of New York has used in its daily operations 
a current report measuring from hour to hour the factors affecting the 
market: the currency payments and receipts; the transfers into and out 
of New York; Treasury payments and withdrawals; gold movements: 








mats wee OO Oe eS oD 


























Proceedings 153 


the changes in the amounts of various forms of Federal Reserve credit 
in use; and any other influences upon the market. These reports form 
a basis for the operating policies of the Bank relative to the money 
market. 

Hourly and daily figures of this sort are not available to other stu- 
dentsofthe money market. But inrecent months the principal data for 
‘ sh studies have become available on a weekly basis. A few months 
.zo the'Federal Reserve Board began including in its weekly statement 
of the condition of the Federal Reserve System a brief table which has 
perhaps escaped the notice of many. It is, however, a table which 
should be greeted by statisticians as tending to mark a new epoch in the 
statistics of the money market, for it makes generally available for the 
first time data which up to now have been accessible only to the stu- 
dent within the ReserveSystem. This table summarizes accurately the 
principal causes operating on the money market and is as follows for the 
middle of October, 1930, compared with the corresponding date for 
October, 1929. This date is chosen because it gives a picture of the 
money market changes from a wetk before the stock market crash to 
the depths of the business depression. 


Increase or decrease since 
October 15, October 8, October 16, 


1930 1930 1929 
(In millions of dollars) 
ES a terete negeeenekben ns 210 +37 —639 
Ee eer eere er 185 — 26 —175 
United States securities..................... 602 + 2 +464 
Other reserve bank credit................... 47 +19 — 63 
TOTAL RESERVE BANK CREDIT................ 1,044 +32 —413 
Eo oc dance vecnseusseunes 4,515 +1 +136 
Treasury currency adjusted................. 1,796 +12 + 15 
Money in circulation . OT ee +10 — 292 
Member bank reserve balances . i 2,440 +32 + 32 
— vennnis funds, nonmember deposits, 
etc. . nor : ibuuenenkixeed 419 + 3 


The first five items in the table show the kinds and total of Reserve 
Bank credit in use. The next show outside factors: gold stock, two 
currency items, bank reserves and miscellaneous adjustment items. 
This simple table tells the story of the year’s changes in the market, 
somewhat as follows: 

The money position has grown very easy, since Federal Reserve 
discounts are at the extraordinarily low figure of 200 million dollars. 
There have been three major causes for this change: an increase in the 





Oe te 


ee 


SAE INI A Katrin. Stn A tl ar BME 


bil Wines endl rutiododiunesiaiptindh niente aa ee 


eee a oe 








154 American Statistical Association 


country’s gold stock, a decrease in money in circulation (reflecting 
business depression and price declines), and large purchases of govern- 
ment securities by the Federal Reserve Banks. One rather surprising 
feature of this analysis is that it shows that the liquidation of bank 
credit has not been a factor directly effecting credit ease, for bank 
reserve requirements are practically unchanged from a year ago. 
This very fact suggests some of the peculiar circumstances of this 
depression. There was, it is true, much liquidation of credit accom- 
panying the decline in prices, but this liquidation occurred not in bank 
credit, which remained relatively constant, but in loans made by others 
than banks. More specifically it took the form of a decline in loans on 
securities made by business corporations and individuals. In some 
measure the banks had to take over parts of these loans during the past 
year. This in part accounted for the stability during the year in the 
total volume of bank credit. While the total of bank credit shows little 
change, the change in its character and the liquidation of these loans by 
others has removed strain from the credit situation. Such a removal of 
strain is illustrated by a decline in the velocity of bank deposits from 
an abnormally high to a normal figure. This shift in the credit picture, 
while not reflected in total bank deposits or in the balance sheet of bank 
reserves, undoubtedly was a factor in easy money. It is one of the 
imponderables, but judged by the past, a minor factor in the determina- 
tion of money rates. The primary causes are contained in the little 
table quoted above. Money rates have moved with discounts and the 
movement of discounts was directly caused by changes in currency, 
gold and Federal Reserve security holdings. 

The simplicity of this table may have concealed its value just as it 
perhaps also conceals the many, many hours of labor which have gone 
into its preparation to reduce its figures to precisely accurate and com- 
parable form. It may be hoped that this table will introduce to general 
use a method of money market analysis which has become possible 
only with the existence of the Federal Reserve System, which carries 
to its logical conclusion those methods of analysis used by Professor 
Kinley in 1910, and which should in time, as data accumulate over a 
period of years, remove from discussion of the money market much of 
the cloud of conjecture and mystery which has surrounded it. 























Proceedings 


STATISTICAL METHODS IN BIOLOGY 


By SEWALL WRIGHT 


The applications of statistical methods in biology are essentially 
identical in purpose with those in other fields of science. There are, 
however, some differences in form and emphasis, imposed by the kind 
of material. In all of the characteristics of the individuals of the mil- 
lion or more species of animals and plants there is variability, not the 
errors of observation of the physicist, but real variability, of interest on 
its own account. An enormous field is presented here for statistical 
methods in merely bringing the phenomena of life into an adequate 
descriptive form which the mind can grasp. 

It was necessary for the pioneer biometricians to readapt the methods 
developed for use in the physical sciences, to their kind of material. 
The classical normal probability curve, applicable enough as a rule to 
the treatment of random errors, was wholly inadequate for the descrip- 
tion of biological variability. Pearson’s system of frequency curves as 
one solution of this difficulty is familiar. Similarly, the methods of 
simple and multiple correlation developed by Galton and Pearson met a 
descriptive need in biology, not encountered in the physical sciences. 
Mathematically these methods were simply adaptations of the method 
of least squares, but there was a significant change in viewpoint. 

The second type of application to which I shall refer is that of the 
determination of the significance of differences, whether between statis- 
tics of different natural populations or between results of experiments. 
Here a more direct borrowing of the methods of physical science was 
possible. But in addition to use of the classical probable error, we 
have Pearson’s x? method for comparing systems of frequencies and the 
methods of ‘‘Student”’ and R. A. Fisher for dealing accurately with 
probabilities in the small number of paired observations characteristic 
of biological experiment. Genetics has been dependent on statistical 
methods from the first. More recently, physiologists, anatomists, 
ecologists and others are coming to realize their importance. 

A third application is analogous to the physicist’s use of the theory of 
probability in the kinetic theory of gases and in the more recent de- 
velopments of statistical mechanics. Mendel’s interpretation of his 
experiments in heredity was a simple example of this sort. The inter- 
pretation of the properties of populations, including the theory of 
evolution, as statistical consequences of the genetics of individuals is 
perhaps the most important example. 














156 American Statistical Association 


I shall devote most of my time to an application which mediates 
between two of the above fields, viz., the field of interpretation of 
statistical descriptions. Biology differs from physics and chemistry in 
dealing with real variability and thus in having a problem of statistical 
description. In this it resembles the social sciences, but differs from 
those and approaches physics and chemistry in the degree to which 
laboratory experiments can be conducted. Thetwo modes of approach, 
statistical and experimental, should supplement each other in giving in- 
sight into natural phenomena. Actually they are apt to be conducted 
from such different philosophical viewpoints that they lead to seem- 
ingly antagonistic interpretations. We have such unhappy situations 
as the existence of two sciences of heredity, Mendelian and biometric, 
scarcely on speaking terms. In biology, at least, we need a technique 
for interpreting the statistical relations of systems of variables in terms 
of our knowledge of causal relations, derived in the laboratory. 

In connection with such interpretation, there is a certain contrast 
between the kind of interrelation of variable quantities which the 
physicist encounters and that frequently encountered by the biologist. 
The variables of the former are usually in a movable equilibrium, 
dependent on reversible processes. One speaks of the functional re- 
lations of the components of such a system, rather than of causation. 
The tendency of physics, emphasized recently by G. N. Lewis, is to 
insist on the complete symmetry of its time. 

The biologist, on the other hand, is to a large extent concerned with 
variables which at his level of observation are related in irreversible 
sequence. He deals with the development of individuals from egg to 
adult, and with the evolution of species. His hereditary units affect 
the characteristics uf individuals which possess them but are not them- 
selves affected. Most of the environmental factors with which he is 
concerned, act upon organisms without being acted upon to an im- 
portant extent. Thus the conception of one-way causation is a useful 
one at the biological level and any treatment of systems of biological 
variables which disregards sequence (where present) omits the very 
part in which the biologist is most interested. Our technique of inter- 
pretation of statistical systems must then take account of sequential 
relations as well as of symmetrical relations. 

A qualitative interpretation of a system of variables (or if one pleases, 
a mere arbitrary point of view) is conveniently represented by a dia- 
gram in which arrows are used to indicate which variables are to be 
treated as functions of which others. Such a diagram is especially 
adapted to representation of one-way causation, but is not limited to 
such relations. Unanalyzed correlations may be represented by two- 




















Proceedings 


CHART I 


Yon = Portig TF Pos Sa 
= P47 Sr 
Porsre TF Poserg + Posre 
= Por Pag Par PPor Pog Gr Part For Por Par 


0 


Pal 
N 
) 


headed arrows, to indicate connection through common factors 
(Chart I). 

It is convenient to measure each variable in terms of its standard 
Xo— Xo 
oo 
expression for deviations of a given variable in terms of those from 

which arrows are drawn to it in the form: 





deviation. Letting m= , ete., we can write the best linear 


Lo= PLe+ Posts. 


The coefficients are abstract numbers, which I have called path 
coefficients, related numerically to the concrete partial regression 
coefficients in the same way that the correlation coefficient is related 
to total regression. They differ from correlation coefficients, however, 
in having direction. Their usefulness depends on an easily demon- 
strated relation to correlation. For any two variables of such a system, 
the correlation can be analyzed into contributions tracing through the 
represented factors of either one. Letting s stand for the factors of 
X> and ¢ for those of X, 


Tou = ZL Pos is = ZPuor- 


By further analysis of the correlation terms, this leads to the easily 
remembered principle that any correlation can be analyzed into con- 
tributions from all of the paths through the diagram (direct or through 
common factors) by which the two variables are connected, and that 
each of these contributions is the product of the coefficients pertaining 
to the elementary paths. One of these elementary paths in each case 
may be an unanalyzed bidirectional one, measured by a correlation 
coefficient. 

As a special case, the correlation of a variable with itself (unity) 
may be analyzed in this way, assuming that there is complete deter- 
mination by the factors represented. 











American Statistical Association 


>> Pos" os = 3 


(If determination is not complete the sum of such products gives the 
squared multiple correlation.) 

In a system which one wishes to analyze, some of the variables 
represented may be ones which have been measured, others may be 
hypothetical. One may deduce unknown correlation coefficients from 
path coefficients, given by knowledge of the functional relations, or 
unknown path coefficients from known correlations, or unknowns of 
both sorts from a mixture of knowns of both sorts. The application 
is, of course, often limited by inability to make a qualitative inter- 
pretation sufficiently definite to be expressed in diagramatic form, and 
even when such a representation can be made, it is only too likely to 
turn out that there are more unknowns than knowns, thus giving 
an indeterminate solution. No quantitative interpretation is then 
warranted until new facts, suggested perhaps by the attempt at formu- 
lation of the problem, have been obtained. 

As a geneticist, I have been especially interested in applications in 
the field of heredity. Let me give as an example a case dealing with 
heredity in man, as perhaps of more general interest than those dealing 
with such animals as guinea pigs. I am taking some data presented 
by Miss D. 8. Burks, involving intelligence tests of some 100 California 
children, tests of their parents, and in addition carefully constructed 
grades of their home environments. These data were obtained by 
Miss Burks as a control for similar data for some 200 children, adopted 
at an average age of three months. The two groups of parents were 
closely similar. I should say that Miss Burks is not responsible for 
the use to which I am putting her data. 

The observed correlations as corrected by Miss Burks for attenua- 
tion are given in the equations, Chart II. Midparents are used for 
simplicity. The correlation between midparent and home environ- 
ment (culture index) was not calculated for the foster data. Presum- 
ably it was closely similar to the figure for the control data. 

The data suggest certain things rather definitely but in other respects 
interpretation is not obvious. The routine method of treatment is to 
calculate partial correlation coefficients or the closely allied partial 
regression coefficients, treating child’s IQ as a function of parental IQ 
and environment. The results are shown in the figures. The solution 
gives a rather curious result. Environment makes a significant posi- 
tive contribution in the foster data, but in the control data its contribu- 
tion is negative, as far as it goes. The partial correlation coefficients 
differ similarly. How are we to interpret this change from +.35 to 





naw 





DP ntti 

















reine 











Proceedings 


























CHART II 
1 Porat’ Regression Equation 
4 n4 rd -p oO - 
a C =Ctp,GlP P+ re ZOE E) 
Child's +.86? Normal Equations (adopted children) 
LQ. arg fps Pes Cee + Pep = +.23 
eet + em a ee 





Parents’ 


aA 1.Q. Normal Equations (own children) 
= + = +.6! 
Childs +86 Meo = Pee Meo F Pep +.6 
1Q a lee= Pe + Peale = +49 


—.13 in the environmental regression coefficients? The answer is, 
of course, that we have no right to put any biological interpretation on 
them. We have prediction equations, the best which the data yield, 
but not an interpretation. For interpretation we must take account 
of the causal relations. The IQ of child, and of parent, and the grade 
of home environment are not functionally related after the simple 
fashion of volume, pressure and temperature of the gaslaw. We know 
that the characteristics of the child trace to two distinct groups of 
biological factors, the constant internal factor, heredity, present in the 
chromatin material of the child’s cells, and the external factors to which 
this heredity and its products have reacted in the developmental 
process. There is one-way causation by heredity and doubtless the 
same to at least a first approximation by environment. 

The IQ of parents is related to home environment both directly and 
indirectly. For the moment, it will suffice to indicate this by a double- 
headed arrow. Heredity of foster children should be independent both 
of parental IQ and environment since Miss Burks shows that there was 
no possibility of selective adoption with respect to intelligence. In 
the control data, parental IQ should be correlated with child’s heredity 
in various ways, all indirect (as being through parental heredity), and 
the diagramatic representation must be by a two-headed arrow. It is 
important to recognize that parental IQ is very far from being the 
child’s heredity. Finally, in the case of human intelligence, it is to be 
expected that there will be some correlation between child’s heredity 
and environment. Good heredity in preceding generations should have 
built up the conditions for a favorable environment. The simplest 
interpretations which can possibly be considered adequate biologically 
are those of Chart III. 





























American Statistical Association 
CHART III 


Adopted Children 
eee Ne = Re = +-27 
ild’ t 
a Ta Top = Ree "ep 


7 Pee tT Ptw 





t 

+ 
nr 
w 






































Own Children 
Vee = +.86 
; a 42 Mee S Pep T Pentng = t-49 
Child's tod 24 Top = Pee ae P Pen Me = +.6! 
Pes = a Pen 


Pia tPeat2 PrsPon See = |-00 


The analysis of the foster data is very simple. If IQ of the foster 
parents is related to child’s IQ only through correlation with home en- 
vironment, the parent-offspring correlation should be the product of 
the two intermediary coefficients. This leads to a value of the correla- 
tion between midparent and environment (+.79) closely similar to 
that observed in the control data. This indicates that there was no 
influence of the parents other than through the home environment as 
actually measured. There was only about 9 per cent determination of 
variance by home environment (.29 *) leaving a residuum of 91 per cent 
determination and a residual path coefficient of about .96. How far 
this traces only to child’s heredity and how far to unmeasured environ- 
mental factors the data give no answer. But since home environment 
is presumably much the most important environmental factor (cases 
in which there had been illness likely to affect mentality having been 
excluded), one may surmise that the residual group is largely heredity. 

In the other group, the situation is more complex. We can at once 
write three equations representing analysis of the three known corre- 
lations. If we assume that the only factor of child’s IQ apart from 
the home environment as measured is heredity, we can write a fourth 
equation expressing complete determination. But there are five 
coefficients to be determined. No solution is possible and no quantita- 
tive interpretation is possible from the data of the control group. This 
is not a fault of the method. It is rather a merit that it brings clearly 
to light the impossibility of any biological interpretation without 
further data. 




















a 
: 
: 
- 


Be: 


Sanne a ss 





















Proceedings 161 


In the present case, however, we have another resource. The control 
group of parents was carefully selected for comparability with the foster 
group. Presumably home environment has closely similar effects in 
the two cases. We should be able to borrow the environmental coef- 
ficient from the foster data. Theoretically, however, it is the concrete 
partial regression coefficient and not the path coefficient which is 
directly transferable, the latter being affected by the correlation be- 
tween heredity and environment in the control data. From this it 
may be deduced that the ratio p,,:Pcq should be the same in the two 
cases, giving a fifth equation. 

These five equations differ from the normal equations of the ordinary 
application to multiple regression in not all being linear. They are 
easily solved, however, with the results indicated in Chart III. It may 
be well to emphasize the point that the fact that it was possible to use 
the relation between environment and child’s IQ of the foster data in 
the control data and obtain results conforming to the observed correla- 
tions in the latter shows that the apparent contradiction in the partial 
correlations in the two cases was an illusion. 

One may be struck by the low correlation (+.42) between mid- 
parental IQ and child’s heredity in contrast with the high correlation 
(+.86) observed between the former and home environment. 

It appears that midparental IQ is a much better index of home envi- 
ronment than of child’s heredity. This is not surprising, however, in 
the light of genetic theory. Even the correlation between midparental 
heredity and child’s heredity is theoretically only 3} or .47 under 
certain common conditions (complete dominance present and no 


CHART IV 


2 
—_ 


eel —_ 
— 5 20 


$$ 
R14) 

















ie] ie] 


a _ wae 


“Misc. includes contributions To a due To 

l. Nomadditive combination effects of genes (dominance, epistacy) 
2. Nonadditive combinalion effects of heredity and environment 

3. Residual environmental factors (uncorrelated. with parenfal [.Q.) 



































American Statistical Association 





162 


assortative mating). It may be worth while to carry the analysis a 
generation back, still averaging the parents for simplicity. 

For this analysis, it is convenient to deal with heredity in a dif- 
ferent way. In place of heredity as a factor in development, we shall 
use the genotype as the sum of such gene contributions as best ap- 
proximate the developmental ranking. The two measures are identical 
only if dominance is wholly lacking and there are no epistatic effects 
(i.e., if the effects of independent series of genes combine additively). 
This will give us a minimum instead of a maximum estimate of the 
genetic factor, compatible with acceptance of the observed correlations. 
We must now recognize a residual factor in both generations, theo- 
retically composed of three very diverse elements, which, however, 
cannot be distinguished in the present data. These are the usually 
important contributions of dominance and epistacy to variance, just 
referred to, which are purely hereditary factors from the develop- 
mental viewpoint; second, environmental factors not included in the 
measure of home environment, and as indicated by the foster data, not 
correlated with the parents; and third, possible contributions to vari- 
ance due to non-additive effects of heredity and environment in relation 
to each other. 

Home environment is treated as in part created by the IQ of the 
parents (direct path) and in part as tracing from the previous environ- 
ment of the parents, as would be true of the effects of inherited wealth 
and family tradition. The possibility of some independent determina- 
tion is indicated by a third arrow. 

Child’s genotype traces, of course, to midparental genotype but is not 
completely determined thereby because of the intervening phenome- 
non of Mendelian segregation. If there were no assortative mating 
the correlation (and also path coefficient) is V3 or .707. In the pres- 
ent data there was strong assortative mating, .55, to be raised to about 
-70 on correcting for attenuation. This raises the value of the above 
path coefficient to a slightly varying extent, depending on the nature 
of the assortative mating. The value, .78, can be accepted as reliable 
within a smaller range than any of the observed correlations. 

The diagram has twelve paths. To make it quantitative at least 
twelve equations must be found for solution. We have just derived 
one from Mendelian theory, and three others are given by the three 
observed correlations. The four residual paths correspond to four 
equations of complete determination. The environmental effect on 
child’s IQ, borrowed from the foster data, brings the number up to 
nine. The list could easily be completed, if we could assume that the 
situation back of the parents was the same as back of the children. 


























Proceedings 163 


This is not a necessary assumption, however, if only because the parents 
were tested as adults, the childrer. between 5 and 14 years of age. It 
turns out that it is a mathematically impossible assumption. To 
complete the series it was assumed that the ratio of the coefficients in 
the case of the residual group and genotype is the same in the two 
generations (expected if the residual group is largely due to dominance 
and epistacy) ; second, the environmental influence and the correlation 
between genotype and environment were adjusted to agree as well as 
possible; and third, a small arbitrary value was assigned to the residual 
path back of environment (a value practically immaterial if small). 
The solution is not strictly determinate, but is so within rather narrow 
limits. The coefficients are accordingly given only to the nearest 
.05 (Chart IV). 

We have as a result a somewhat roughly quantitative interpretation 
of the relations of the variables in this population, partly based on 
observed correlations and partly based on our knowledge of the mecha- 
nism of heredity. It illustrates sufficiently, I hope, the difference 
between a biological interpretation of statistical data and a prediction 
formula based on the same data. 














164 American Statistical Association 


STATISTICAL METHODS APPLIED TO PSYCHOLOGICAL 
PROBLEMS 


By Truman L. KEetury 


Psychology as one of the social studies has need of the usual statisti- 
cal techniques that serve so admirably in all social fields. I refer to 
formulas and the basic concepts corresponding to such terms as aver- 
age, measure of deviation, form of distribution, correlation, frequency 
in a class, probability of occurrence. The necessary connection be- 
tween these concepts and the appropriate procedures to be employed in 
answering certain psychological problems could be illustrated practi- 
cally without end. However, since the same is true of all the social 
sciences, I shall not take time to cite such illustrations, but shall con- 
fine my remarks to those statistical concepts which are more or less 
unique to psychology. Certain of these concepts are likewise called for 
in related sciences—biology, economics and medicine—but as these 
latter have not led in the theoretical development and practical use of 
the techniques in question, it is appropriate to look upon them as 
quite peculiarly belonging to the statistics of psychology. 

Growth, as a psychological problem, has much in common with 
biological growth, but it is amenable to nurture to an extreme degree. 
Consider growth in typewriting efficiency, commonly called learning: a 
person of age 6, 16, or 60 will show much the same learning curve—not 
exactly the same, for maturity is presumably a small factor. The 
physiologist studying the growth of the heart must study it as it hap- 
pens in the growing child, and can do little in providing a pabulum 
facilitating or hindering the process. Though growth of the heart and 
of typewriting ability both depend upon nurture and nature and thus do 
not differ in kind, they do differ so markedly in degree that the empha- 
sis is quite dissimilar. The psychologist has the more difficult statisti- 
cal problem, for he must interpret and obtain results as consequent to 
two things, a physiological growth process plus (and plus must not be 
taken in the algebraic sense, for it may mean ‘‘times’’) a specially pro- 
vided set of psychological stimuli which may or may not have been at 
his option. To isolate these two things is a statistical problem of no 
mean proportion. It has not as yet been done in such important 
functions as are measured by Binet tests, performance tests and the 
like. Since it is impossible to have children desist in physiological 
growth while the effect of some particular nurture is being investi- 
gated, and since by the time physiological growth is completed, say age 


wt 


i ORE a8 


eT 
nee Ke 


—e 


a 
= 

















ti- 


aia 


st 





SU ke 2 








4 
i 
j 











Proceedings 165 


21, there is a great amount of past nurture influence which cannot be 
undone, there is no possibility of experimentally isolating the physio- 
logical and the nurtural growth influences. Thus if the problem is to 
be attacked at all, the analysis must be mathematical and theoretical, 
and not experimental. 

As investigation is undertaken from the inorganic world up to the 
organic, through the protoplasmic, through the plant and animal king- 
doms, to the human, greater and greater difficulty is experienced in 
securing comparable samples of basic materials. Whereas every mole- 
cule of carbon can be taken as a sample of every other, and whereas 
nearly every clam shell may be taken as a sample of every other of the 
same age and species, certainly every boy of five cannot be taken as a 
fair sample of every other boy of five. The problem of sampling takes 
on peculiar importance and difficulty in connection with psychological 
issues. Some there are who maintain that all of human psychology is 
misleading because no one is comparable to anyone else. Granting 
this as a logical proposition, it is nevertheless found empirically that in 
many respects they are. Give me 100 boys of the same age and school 
grade, and divide them by chance into halves. Upon the basis of per- 
formance of the one group, I will venture to predict the average per- 
formance of the other within narrow limits upon any matter of physical 
prowess or intellectual accomplishment that one may wish to choose. 
In some very real and serviceable sense, the one group of fifty is a 
sample comparable to the other. Logical and statistical criteria of 
comparability should be developed by psychology, for in this field, as 
in the field of economics where equal sampling difficulties prevail, little 
of a basic nature has been done to meet the logical issue. 

Psychology mainly concerns itself with mental life. Mental life has 
one very distinct feature from all other life—it has no single sensory or 
motor end organ. This fact provides the basis for some subtle and 
entrancing logical and statistical problems. The physical scientist, the 
astronomer, the biologist, the geologist, and even the doctor and 
economist can start their thinking from some important observable 
fact, a measure of electrical pressure, of etheric vibration rate, of body 
size, or temperature, or price per bushel at some place of some com- 
modity. In each instance there is a physical location of the thing 
measured. Except for such sensory and motor functions as are 
mediated by particular end organs, there is no specific location known 
for mental acts. The phenomena of relearning after decerebration 
suggests that there really is no single location for the more important 
mental acts. The psychologist of mental life is apparently not called 
upon to measure a thing, but to measure a something which is once 








166 American Statistical Association 


removed from things; let us call it a relationship. What is the first 
word that you think of when I say “table”? Perhaps your answer is 
“book.” The mental process aroused in your mind by my question 
has led to an overt response from which I may be able to infer the exist- 
ence of a mental relationship, but I cannot locate it in space, hold it 
and go back to it, as a doctor can go back to an ailing finger, and attend 
and reattend to it. 

Since a relationship is apparently just as true, viewed from the top as 
from the bottom, from the left as from the right, there seems to be no 
given point of departure which is right in some sense in which others 
are wrong. As I see it, this is equivalent to the mathematical state- 
ment that one set of axes of reference is as good asa second. Wherever 
mental acts are intimately connected with sensory organs, as music 
with the ear, or painting with the eye and hand, the statement just 
made will presumably not hold. In such fields of mental life as those 
of interest, attitude and sundry intellectual abilities, the statement 
may hold. If so, the logical and statistical consequences are far 
reaching. Their full import is a matter for the future to discover, but 
it seems fairly clear that the mathematical treatment called for will 
involve a search by statistical means for the invariants of mental life 
as axes of reference are changed. So far as I know, statistical demands 
of this sort are unique to psychology. 

It may be that certain invariant features of the mental relationships 
that course through a child’s mind will be found to be measurable with a 
certainty which does not now attach to measuring his specific informa- 
tion upon a given topic, or his ability to perform a specific type of 
mental work. Certainly it is true that these latter things are now 
measured with a far too low reliability to provide the much needed 
data for refined understanding and analysis of individual mental life. 
Low reliability presents psychology with another nearly unique statisti- 
cal problem—that of correcting for the systematic errors in available 
measures, and estimating underlying true relationships. One phase of 
this is represented by the now common correction of a correlation co- 
efficient for attenuation. This technique is practically unknown out- 
side of the field of education and psychology. That it is necessary in 
the solution of certain problems in these fields need not be argued. It 
may, however, be pointed out that all statistical measures coming 
originally from measures of low reliability, and this means all mental 
measures, not merely correlation coefficients, should be scrutinized for 
systematic as well as chance errors. Probably greater progress has been 
made in handling this statistical issue, first clearly pointed out by Spear- 
man, than any other of those that are relatively unique to psychology. 











RR. 











Ron 


— ee ae ae 











Proceedings 167 


In addition to the sampling difficulty already pointed out, there is 
still another. In mental measurements, there are double sampling 
errors. There is the deviation of the individual chosen from the mean 
of the group which he is supposed to represent, and second, there is the 
deviation of the particular mental measure obtained from this indi- 
vidual from his own mean response. Suppose a single reaction time 
observation of a single subject yields a time of 150 ¢. The mean time 
for this individual for just such reactions may be 140 ¢. Thus as a 
sample of the individual, there is a 10 ¢ error in the obtained record. 
Further, this individual is probably thought of as a fair sample of a 
large group. The mean time for this large group is, let us say, 120 o. 
Thus, the individual’s true time, his mean reaction time, which is 140, 
is 20 o in error as representing the group, and his single obtained reac- 
tion time, 150, is in error by 10 as representing himself, so that as 
representing the group it is in error by the algebraic sum of the 10 and 
the 20. Some psychological problems are only concerned with errors of 
sampling of the first sort, and some with errors of the first sort and the 
second sort combined. No psychological issue can be concerned with 
errors of the second sort only, for the individual score is (a) taken as a 
measure of the individual’s true score, or (b) taken as a measure of the 
group score. This exhausts the options. Keeping track of these 
respective sorts of sampling errors is a problem in other social sciences, 
but it is probably more clearly recognized as a distinct problem in 
psychology than in economics or sociology. 

There are probably still other unique statistical issues which concern 
the psychologist, but those mentioned will suffice to show that the 
statistics of psychology is a fairly clearly defined and in certain respects 
an independent structure, calling for the independent research of 
statisticians who are at the same time psychologists, and of psycholo- 
gists who are at the same time statisticians and mathematicians. 

















168 American Statistical Association 






STATISTICAL METHODS IN PERSONALITY STUDIES: 
RELIABILITY 


By Mark A. May 


The data of personality studies are of such nature that neither the 
statistical methods of economics and sociology nor those of psychology 
and education will wholly apply. Workers in this field have, as a 
matter of fact, drawn heavily on the statistics of psychology and edu- 
cation, partly because their data are of a psychological nature but 
mainly because they have had their training in this style of statistics. 
The tendency to apply to one’s data the statistical techniques which are 
familiar and ready at hand is quite natural but not always conducive 
to scientific progress. As interest in personality studies increases and 
as data accumulate, there will be an increasing tendency toward the 
development of a set of statistical methods that apply more directly 
than the existing stock of formula found in textbooks on statistics for 
students of psychology and education. 

Before discussing the reasons why current statistical techniques are 
inadequate for handling the data of personality studies, it is important 
to examine briefly the nature of these data. At their face value they 
represent a grand assortment of opinions, ratings, observations, test 
scores, answers to questionnaires, results of interviews, case histories, 
biographies, letters, documents and laboratory experiments. In look- 
ing them over one is impressed with the scarcity of truly quantitative 
measurements. 

A careful examination of the majority of the so-called quantitative 
data of personality studies (and of most psychological and educational 
studies also) reveals the fact that they are mainly crude quantitative 
descriptions of responses in the form of enumerations or counts rather 
than precise measurements of amounts of traits, abilities, opinions or 
attitudes. For example, the results of tests are supposed to be the most 
quantitative of all data in this field, but even here a raw test score is 
usually only a count of correct responses, or of responses of one type. 
It tells us only how many responses of a previously defined type the 
subject gave; it does not tell us, at least in the raw form, how much of a 
given trait the individual possesses, neither does it tell us, save in a very 
rough way, whether he possesses more or less of the trait than others. 

This view of the nature of the data of personality studies may be 
called the sampling theory in contrast with the measurement theory. 
The sampling theory holds that all of these data, including test scores, 








Ge ale 
3 a 7 
ae as 


pied 


—— 


. = Wis 











ey OU 
a bg 
uu & 
ut i 
: 
‘S. ie 
re 
re f : 
aq @ 
e ff 
y oi 
Tr ¢ 
3 
a ) 
t 


Oi ep IN Ala Vib et cae on 





eidiacsins ore 

















Proceedings 169 


character ratings, attitude expressions, observations of behavior, case 
histories, questionnaire returns and all the rest, are in fact so many 
samples of types of behavior, opinion, abilities or attitudes. The 
theory further assumes that traits, abilities, attitudes and opinions can 
be measured only in the sense that they can be sampled. The measure- 
ment theory, on the other hand, holds that traits, abilities, attitudes 
and the like exist as entities within the individual and can be measured 
in the same sense as electricity and other physical forces. This requires 
devising instruments or techniques of measurement that run in equal 
units counted from defined zero points. All of this is avoided some- 
what by the sampling theory, which holds that traits are abstractions 
in that they are only names given to elements which a large number of 
responses may have in common. They can be described quantita- 
tively only by securing a large number of situations which possess the 
common element in question. Some samples may be described in 
units of amount where such units are found. But this is a different 
matter from assuming an inner force or entity which manifests itself in 
varying amounts in behavior or expressions of attitude and opinion. 

I shall not attempt here to defend further the sampling theory of 
the nature of the data of personality studies but point out that if this 
view is the correct one, then the statistical procedures for dealing with 
these data must differ somewhat from those designed to deal primarily 
with the results of true measurements. In another paper! I have 
shown, for example, that Yule’s formula for finding partial correlations 
is inadequate in dealing with the results of character and personality 
tests, and have proposed a substitute formula which is freer from as- 
sumptions and which fits these data better. In this paper I shall at- 
tempt to show that the usual formulas for finding the reliability of 
tests and testing techniques are inadequate for dealing with person- 
ality data, especially if the sampling view is taken concerning the basic 
nature of these data. 

Consider the usual statistical methods of determining the reliability 
of a test, technique or method of securing data. The standard pro- 
cedure is to secure two independent sets of data on the same trait or 
ability, using the same subjects. The coefficient of correlation between 
these two sets is regarded as a measure of the reliability of the test, 
technique or method. But this method will not apply to personality 
studies because in the first place it is based on certain assumptions 
which are not always fulfilled by the data, and in the second place it 
assumes a measurement and not a sampling theory. 


1 Mark A. May, “A Method for Correcting Coefficients of Correlation for Heterogeneity in the Data,”’ 
Journal of Educational Psychology, Vol. 20, 1929, No. 6, pp. 417-423. 














American Statistical Association 





170 


Consider first the assumptions involved. Reliability implies con- 
sistency and consistency means low individual variability. In a test 
of high reliability the errors of measurement are small in comparison 
with the quantity measured. An error of measurement is a deviation 
from a true score and a true score is arbitrarily defined as the mean of 
a large number of measurements of the same trait, function or ability 
in the same individual. Or, in other words, an individual does not al- 
ways respond in precisely the same way to the same situation, nor does 
he always make the same score on a given test; in short, there is vari- 
ability in his performance, opinion or attitudes, even in situations 
that are as nearly alike as can be found. Ifa large number of observa- 
tions, tests or ratings are taken on each individual under the same gen- 
eral condition, it is possible to compute for each individual a mean score 
and a standard deviation. The standard deviation is a measure of 
individual variability. It is with the size and causes of individual 
variance that reliability is concerned. 

Statistically, unreliability is the ratio of the average of individual 
variances to the variance in a distribution of samples when one is se- 
lected at random from each individual’s set of measures. Hence 


unreliability is m and reliability is 1S. The self-correlation be- 
Co 
tween scores chosen at random in sets of two each from each individual's 
measures is identical with reliability as defined above provided certain 
conditions are fulfilled by the data. 
The algebra involved is briefly as follows: 
Let o,? designate the average of the individual variances around their 
own means. 
Let o,,? designate the variance in the individual means. 
Let o? designate the variance in all measures around a general mean. 
Let o;? designate the variance in a set of scores when one is selected at 
random from each individual’s measures. 
Let ri. designate the self-correlation between two sets of such measures. 
The following relationships hold among the above standard devia- 
tions: o.2=0?—o,”? and o?=o;? within the limits of sampling errors. 
Hence o2=0;°—on?. Now om?=crre2 provided certain assumptions 


hold. If so, then ¢,2=0;? — o;°r25 and rz =1— < . But this depends on 
o1 


whether or not om?=01’r. Suppose two scores are selected at random 
from each individual’s distribution. Call them X, and X2. Then 
Xi=m+e, and Xe=m+e; then ry0102= Gm? +[PmeFmFe, +l meImFe2 + 
Txa%e%]. If the terms in bracketsadd up to zero, then, and only 


























Proceedings 


2 
° : CG: 
then, will om? =ryoic2 and if o:=02, then r2= — and om? =o1"r12. 
G1 


The above derivation stands or falls with the assumption that the 
three terms in brackets sum to zero. Two of these terms are correla- 
tions between individual means and the deviation from these means of 
the two sets of scores selected for correlation, the other is the correla- 
tion between the two sets of deviations. Whether the three correla- 
tions are zero, or add up to zero, depends partly on the extent to which 
each individual’s distribution approximates normality and partly on 
whether or not individual differences in variability (i.e., individual 
SD’s) are correlated with individual differences in the ability (i.e., in- 
dividual means). Whether there is correlation here is a matter for ex- 
perimental determination in each case. Thorndike' has accumulated 
certain evidence which indicates that in the case of measurements of 
intelligence there is no correlation; but Hartshorne and May? have 
evidence that in the case of deceit there is correlation between individ- 
ual means and standard deviations. In order to make such deter- 
minations it is necessary in every case to have more than two scores 
or measures for each individual. 

The conditions under which self-correlation measures reliability are: 
(1) that all individual variance be regarded as errors of measurement; 
(2) that each individual’s distribution of scores be approximately 
normal, or at least symmetrical, and that the two scores selected for 
correlation be taken at random; (3) that the means and standard 
deviations of individual distributions be uncorrelated. 

In personality studies the first condition stated above assumes a 
very important réle. It is really a question of how much of individual 
variance is to be included in reliability. Strictly speaking, reliability 
refers only to that portion of individual variance that is due to varia- 
tions in the testing conditions, or the conditions of gathering the data, 
whatever they may be. In character and personality studies where 
the data are sample responses representing types of conduct, attitudes, 
opinions and abilities, individual variance may be produced by a 
wide variety of factors, some of the more important of which are as 
follows: (a) variation in testing procedure, or in the general procedure 
of recording the data; (b) variation in external conditions, such as in 
temperature, time of day, social environment and the like; (c) changes 
in internal conditions in the individual, such as fatigue, worry, head- 
aches or other bodily ailments, increase or decrease in motivation, 
drive, interest or attitude toward the test or experiment and the like; 


1E. L. Thorndike, The Measurement of Intelligence. 
*H. H. Hartshorne and M. A. May, Studies in the Organization of Character, pp. 313 ff. 














American Statistical Association 





172 


(d) practice effect, or any change due to previous testing or observa- 
tion; (e) maturation, growth, integration or similar factors due to the 
mere lapse of time. The first two, (a) and (b), may be grouped and 
designated as variations due to conditions under which the data are 
gathered; the remaining three may be grouped and designated as 
variations within the individual. In personality studies it is impor- 
tant that these two groups be kept apart because one of the major 
problems is that of determining the degree of variation in individual 
responses to the same situation. 

The point which we wish to emphasize here is that if for any reason 
it seems desirable to separate the above two groups of factors, calling 
the first group factors of unreliability of method, and the second group 
variations within the individual, then the self-correlation of two meas- 
ures selected at random from each individual’s distribution will not 
measure the variance in either group but will measure the variance in 
all groups together, provided the assumptions discussed above hold. 

In personality studies it is important that individual variance due to 
the first set of factors be separated as sharply as possible from the sec- 
ond set. One reason is that in many studies the main object of inquiry 
is the amount of individual variance due to factors within the individ- 
ual; another reason is that when all factors are lumped and the total 
variance is determined by the usual process of self-correlation, many 
personality tests and other methods of securing data appear to have 
a very low reliability. For example, the Downey Will-Temperament 
test is notorious for its low self-correlation and the same is true to a 
lesser degree of many other personality tests. In fact, there are very 
few, if any, techniques for studying personality that have self-correla- 
tions of .90 or better. This state of affairs is due in no small measure 
to the fact that variations within the individual are relatively large. 

The problem of separating the factors producing variation within 
the individual from those producing variation in method is more ex- 
perimental than statistical. I know of no statistical hocus-pocus by 
which the trick may be turned when the only available data are two 
sets of scores, or two sets of data secured on the same individuals 
under the same general conditions. The famous split-form technique 
will not do it for two reasons. In the first place, it eliminates too much 
in that it not only takes care of variations within the individual but 
also rules out variations in the testing process. Thus it gives self-cor- 
relations that are too high. In the second place, it is difficult to secure 
split-form data in many types of personality tests. 

The best way out is to determine the reliability of the method 
under conditions in which variations within the individual are experi- 























GO met ODE 











Proceedings 173 


mentally controlled. For example, the reliability of a chronoscope as 
a device for measuring reaction time is determined experimentally. 
The reliability of various observational techniques for securing per- 
sonality data may be similarly determined. When it comes to testing 
techniques the problem is more complicated but it can be done. For 
example, Hartshorne and May standardized certain of their techniques 
for sampling deceit by experimentally eliminating the deception fac- 
tors in preliminary studies. When variations within the individual 
are thus experimentally held constant, it is possible to determine the 
reliability of a method by self-correlation provided the conditions speci- 
fied in the derivation of the formula are fulfilled. 

If the reliability of the method is determined experimentally in ad- 
vance it is possible, then, by self-correlation to approximate the average 
amount of individual variance that is attributable to the second set of 
factors (i.e., variations within the individual), provided these factors 
are uncorrelated. The average individual variance when all factors 
are included is given by the formula o=0¢;(1—ry) provided the as- 
sumptions hold. The variance due to the unreliability of the method 
is given by o2=0,2(1—rel.).1. Then the variance due to variations 
within the individual is the difference between these. Call it o,’. 
Then o,?=07(1—ryz)—o.2(1—rel.). If ofP=o0,? then o,?=0;7(rel. —ry). 

Before leaving the question of reliability mention should be made of 
the conditions under which the famous Spearman-Brown formula for 
predicting reliability of an extended test may be used in personality 
studies. As is well-known, this formula is simply a condensation of 
Spearman’s formula for the correlation between two finite series of 
equal length in terms of their elements. 

The formula is as follows: 


_ Tyut+(a—1)r, i 
Va+(@—a)rn Vo+(O—b)r 

in which 7,, is the average reliability of the elements, 

and ri. is the average inter-r of the first series, or first testing, 

and r, ,, is the average inter-r of the second series, 


and 7, , is the average cross-r between the two series. 
Now if all these different average r’s are assumed to be equal, then 





T42+ ++ +a) (I+II+~ - +b) 





the above formula reduces to = which is the Spearman-Brown 


ar 
1+(a-—1) 
formula. 
1 The symbol o.? designates the variance in one set of scores selected out of individual distributions in 


which all variance is due to method, and rel. is the reliability of the method obtained by the self-correla- 
tion of two such sets of measures. 

























American Statistical Association 





174 


In personality studies the Spearman-Brown formula has been used 
very freely and often uncritically for predicting the correlation between 
the sum or average of two or more sets of scores or samples of responses 
and another set of equal length. The obtained reliability coefficient 
from which the prediction is made is nearly always the self-correlation 
between two samples. Thus all the criticisms that have been made of 
self-correlations as measures of reliability apply here, and more, too, 
because still further assumptions are involved. The additional as- 
sumptions are: (1) that the standard deviation of all samples or sets 
of scores are the same; (2) that all the sets of average inter-r’s as 
shown above are equal. 

In regard to the assumption of equality of standard deviations of 
samples, if the factors of practice, or the effects of previous testing or 
observation, or maturation or growth enter into the complex of causes 
of individual variance, then the two standard deviations of the cor- 
related scores will probably not be equal. In regard to the second as- 
sumption, the different sets of average inter-r’s are more likely to be 
the same in cases where the self-correlation is between two forms of 
the same test or where the observations are of the same conduct in 
different situations, than in cases where the tests are repeated or where 
the samples are from the same situation. 

To sum the matter up concerning reliability, the usual procedure of 
finding the correlation between two samples, or two sets of scores or 
measures secured from the same individual on the same test or similar 
tests, or under the same or similar conditions, will not yield a satisfac- 
tory measure of the reliability of techniques for the study of personality 
for two reasons: first, the means and standard deviations of the 
measures on each individual are likely to be correlated, making the 
self-correlation of two samples too high or too low depending on the 
degree and the sign of the correlated means and standard deviations; 
second, individual variance is one object of personality study and may 
be due in many instances to factors other than variations in method. 


















, 


a 
x 
. 








Ses 














Proceedings 


STATISTICAL METHODS IN COLLEGE 
ADMINISTRATION 


By Hersert A. Toops 


If we grant the aim of pre-college education to be the acquirement by 
every citizen of a minimal amount of the social heritage, then perhaps 
we may grant the college the unique aim of preparing the survivors of 
the pre-schools for making additions to the social heritage—the aim of 
training creative ability. 

One of the first tasks of higher education, then, is to become aware 
of what the previous schools and other agencies have done to its source 
of supply, its raw material, the boys and girls graduating from high 
school. A second fundamental glance must also be taken at what the 
college does to those entering. And then, having taken these two 
glances, we shall be in a position to guess what ought to be done next. 

We are coming to believe that the problems of the college can only 
be solved by a concerted research effort involving as subjects of in- 
vestigation the college students before they come to college and even 
before they start to high school, their high school teachers, their college 
teachers, and ultimately their employers. 

It has long been known that only a small per cent—ten per cent or 
so—of the population become high school graduates, and that of these 
not over one-fourth to one-third go to college. It has likewise been 
known that a large proportion of those going to college fail to graduate. 
Assuming—very gratuitously assuming—that this is as it should be, the 
thought at once occurs, “‘ Are there not many high school graduates of 
real ability, not now going to college, who might replace those who now 
come to college only to fail?” 

In each of the past two years in Ohio, practically every senior (over 
30,000 annually) has been given a uniform intelligence test of 114 hours 
length. These studies show that the average high school senior is but 
slightly inferior to the Ohio college freshman, of whom practically all 
have been tested annually for the past four years to obtain check 
figures. This means that, granted the powers of an absolute dictator, 
and with some 12,000 freshmen to be chosen from 40,000 potential ap- 
plicants, the colleges might be recruited almost twice over from the top 
half of intelligence, choosing almost none from the lower half. We 
shall later consider whether this would be wise. But wise or no, we do 
not have the powers of a dictator! The same power which stays the 
hand of the dictator forbids—the power of economic necessity. 


















176 American Statistical Association 


In connection with the above surveys we have collected socio- 
economic data which have been punched, together with the test data, 
into Hollerith cards. A few findings will be of interest: On a prelimi- 
nary sampling, all the families of architects in Ohio (1920 census fre- 
quency of architects) have sent during the past ten years to Ohio State 
University one freshman from every six families; while only one fresh- 
man from every 2,000 laborers’ families has come to the State Univer- 
sity. The inequality of opporturity is not all due to financial con- 
siderations; for, if we take the occupations of fathers of high school 
seniors, every minister’s son and daughter (some seven) attending 
North High School, Columbus, was expecting to go to college in 1930- 
1931, while only one barber’s son out of twelve was expecting to go. 
The boy who owns a typewriter is over twice as likely to go to college 
as the one who does not own one; and likewise for the child of a father 
who has a telephone as versus the one whose name does not appear in 
the telephone directory. 

From such factors, it appears likely that an index of probability of 
going to college may be established at the day of entrance to high 
school. Then if the boy shows academic proficiency, his teachers may 
take steps to suggest to him and his family ways and means for making 
college attendance possible. One of such ways is to equip the high 
school boy or girl with those skills which at college are exchangeable for 
cash in outside employment. A survey of the extent of possession of 
100 skills by 3,000 college freshmen shows that salable skills are con- 
spicuous by their absence—dishwashing and care of babies being the 
main skills offered by the women; and unskilled labor and odd jobs by 
the men. 

Still another, and far more fundamental, approach, I believe, is to 
make a basic adjustment to the fundamental distribution of wealth and 
income as we have them in America. Every real estate man knows 
that there are fifty potential purchasers of a $5,000 house to one pros- 
pect for a $50,000 house. Similarly, a family’s resources for higher 
education hold out only just so long; they are not indefinitely expansible 
since a portion must come from savings and discounted future earnings 
—loans—rather than from current income. That being the case, why 
not compress the length of educational careers to a point where the 
wealth of American families is sufficient to purchase it in abundance? 
Why not do the work of nineteen grades in sixteen? Fortunately, this 
is easily accomplished—at least by those who are of nineteen grade 
calibre. It has for some time been known that at least one year can be 
saved in the elementary school; and, with two more in the high school 
and college, three in all would be saved. It is not very hard in the ele- 



























| 09 


i es 


—_aee FY SOUS 





—_ 








Proceedings 177 


mentary school to do the work of eight years in seven; for it has been 
done in practically as many months by adults in mountain schools and 
army schools; and under war-time conditions, four-year trades have 
been mastered in half as many weeks. 

It will be harder in high school and college for the reason that there 
are so many vested interests to appease—so much teaching of Latin to 
pupils who will become teachers of Latin teaching their pupils to be- 
come teachers of Latin teaching their pupils to become . . . 

By the method proposed, all families now capable of purchasing a 
college diploma will be capable of purchasing a Ph.D. degree. Once 
again, a powerful vested interest objects: “But what will become of 
Ph.D.’s? We have an overproduction already!” Studies of the 
careers of Ph.D.’s have shown that they predominantly go into college 
teaching—probably because they weren’t fitted for anything else! 
Studies of their talents show that they are decidedly lop-sided. The 
answer obviously is “turn out super-Ph.D.’s.” Studies of the cur- 
ricula for Ph.D.’s in the various fields undoubtedly would show enough 
overlapping of subjects so that it would seem reasonable to believe that 
some three or four years beyond the ordinary Ph.D. would produce 
super-Ph.D.’s, with a specialized training in, say, city management, 
statistics, accounting, mathematics, economics, sociology, criminology 
and education—a well-rounded education! Let the degree degenerate; 
we already need its successor! With a few such men to run our city 
governments, we would be able in a few decades to solve the fate of our 
Al Capone’s in a permanent and fundamental way—the way of the city 
of Berkeley, California, perhaps. Society now profits by the funds 
with which it endows the trade education of the banker, lawyer, teacher 
—even though it has in the main not provided for the trade education 
of the butcher and candle-stick maker. Let it now go a step further 
and provide entirely for the education of these “leaders of progress’’: 
let it pay them $3,000 to $5,000 per year for three or four years after 
they reach the educational attainment of the present Ph.D. degree. 
Let us use for such purposes the funds now largely wasted on the vast 
horde of students who do not graduate. At Ohio State University 
eighteen per cent of the entrants graduate in four years; not more than 
thirty-five per cent will ever graduate; and in the studies which now are 
beginning to appear this record is not unusual. Surely much of the 
funds spent on the 65 per cent is wasted. 

Perhaps with an adequate guidance program in the high school all 
this will come about naturally. At least five states are making such 
annual inventories of the intelligence or attainment, or both, of their 
high school seniors with a view to undertaking a progressive guidance 








178 American Statistical Association 


program. Our own state of Ohio is embarking upon a comprehensive 
guidance program, calling for the gradual training on the job, and by 
summer school attendance, of all the teachers of the state in guidance 
principles, philosophy and techniques. 

From the viewpoint of research, the guidance program is being sup- 
ported by an annual inventory by Hollerith methods of the more than 
40,000 teachers of the state. Hereafter Ohio will know how many 
35-year-old female teachers with twelve years of experience, teaching 
Latin but trained to teach mathematics, and receiving salaries of 
$1,800, there are in the state. Progress in upbuilding the quality of the 
teaching profession can thus be readily noted. From the point of view 
of ability, a state law requires each teacher-training applicant to pass a 
rigorous intelligence test. The minimum standard, in centile ranks of 
Ohio college freshmen, has been successively raised from approximately 
7 to1l, to 16,to21. At the present time, then, 21 per cent of Ohio col- 
lege freshmen are unable, by definition, to meet the intelligence stand- 
ards for becoming teachers in Ohio. 

The statisticians all tell us that the population of this country may be 
expected to become practically stationary in the next fifty years. 
Whether it will actually become so, depends on what happens in the 
meantime—on a great many things which may happen in the mean- 
time. I assume that it is inevitable that it will slow up. At the pres- 
ent time the largest single item for which taxes are collected is education. 
These educational funds now largely go into building and meagerly 
equipping an ever-expanding educational plant for an ever-enlarging 
school population. Once population becomes stationary, these funds 
for expansion can flow, and in increasing proportion will flow, into in- 
tensive bettering of the school program, into expansion of education at 
the top, with perhaps an ideal of “Every citizen a college graduate,” 
and into those projects of a socio-civic nature for which the enlightened 
leadership of super-Ph.D.’s, referred to above, is a necessity. 

That the above is not merely a Utopian dream may be validated from 
some of the statistics: first, as to differences of ability of individual 
students. Do we have any students worthy of super training? The 
100-percentile students graduate four times as frequently as the 1-per- 
centile students. Measured by hours or points, the bright accomplish 
from two to three times as much “college success” as the low-scoring 
students. The bright students do this at a study time of only 20 hours 
a week, while the low-scoring conscientiously study 50 hours a week. 
Is it not reasonable to believe that if the bright studied 50 hours a week 
also (214% times as much as at present) they could accomplish not 
the 2 to 3 times as much as the low-scoring students as at present, but 








2 





RD ats ue 


heat 





‘ 
4 
“- 
4 
ci 
A 
4 
5 





Proceedings 179 


ve | jnstead 5to8timesasmuch? President Hutchins of Chicago, it would 
by seem, has faith that thisisso. The statistics indicate that the students 
ce ; who waste most time in non-productive types of extra-curricular ac- 
tivities are the bright ones; and, paradoxically, it may be stated that a 
p- , big “overload” of work, often a double load, is what is frequently 
iD needed to salvage the scholarship of bright students who get on proba- 
y tion. 
ig Second, by ironing out the kinks in the educational system. If we 
of define the difficulty of a college course as the average scholarship made, 
e on the average, by a student of average intelligence taking the course, 
v we find college courses so difficult that a student of average intelligence 
a makes “‘D,”’ a near-failure grade; others so easy that an average in- 
f tellect can readily attain a ““B.” We find instructors in different sec- 
y tions of the same course giving vastly different credits for the same 


- amounts of study. 
P The same phenomena may be found in the high school. Among 
other things, this necessitates transmuting the high school marks in 
2 terms of a common denominator so that we shall be able to say: “ This 
. average mark of 60 in High School A is only a 40 of High School B, 
> which is used as our measuring rod.”” This problem seems destined to 
an early satisfactory solution. By utilization of high school trans- 
muted marks, by results from an annual highly valid intelligence test 
and by utilizing the results of the annual scholarship contests, we shall 
have very fair measures, indeed, of ability to make progress. From the 
variables now available, it is quite obvious that we shall be able to work 
out differential individual prognoses of scholastic progress so accurate 
that—if the present system of relative marks, courses, class recitations, 
departments, etc., is to be continued—we shall be able at a very low 
premium rate to insure that a student who passes our examination will 
receive a diploma; if not at College A, then at College B; if not in 
Course C, then in course D; if not under instructors E and F, then un- 
der instructors G and H;; if not by studying 20 hours a week, then by 
studying 40 hours a week; if not by studying by method I, then by 
studying by method J; if not by rooming with K who has a certain in- 
telligence, scholarship and ambition, then by choosing instead as room- 
mate, L, who has a more favorable intelligence, scholarship and ambi- 
tion. Certainly, the risk of not receiving a diploma at the present 
time is so great that any entrepreneur—unless an ardent gambler— 
would demand unusual rewards for accepting the risk involved or 
would hedge against it through insurance. Such a system of insurance 
would do more than any single thing done to date to convince college 
administrators and governing boards of the desirability of rigorous se- 








: 
: 
; 
3 
4 





 d 

5 
- 
4 
: 




















180 American Statistical Association 


lection before coming to college and of motivated individual progress in 
college. (There is a real difference in elimination before entrance and 
elimination after entrance to college.) 

If it were not for limitations of time I should like to dwell on some of 
the research attainments of the laboratories of higher education; on 
some of the implications of what time does to the memory of college 
students for material learned earlier; of the possibilities of carrying on 
sociological research through the medium of high school and college stu- 
dents; of the possibility of research based on extensive annual measure- 
ment of students with the hope that shortly there will accrue a good 
crop of criterion success scores of, say, married people with babies, of 
divorcees, suicides and even murderers (these “natural experiments” 
replacing the controlled experimentation of the physical sciences) ; of 
the possibilities of arriving at segregated categories of populations in 
the 30,000 high school seniors of a state—e.g., several hundred pairs of 
twins, any number of children of divorced parents, or orphaned chil- 
dren and of 18-year-old sons of architects; of the correlation which 
exists between scholarship of college roommates under certain condi- 
tions; of the implications of a proposal to study nationally the supply 
and demand for entrants to occupations—on which, as above noted, we 
shall have very adequate figures for Ohio teachers; on standard codes 
for use in Hollerith studies; of the implications of a proposed perpetual 
inventory, by Hollerith control card, for collecting information about 
the educational business which any commercial firm demands of its ac- 
counting system; of the possibilities of codperative research, employing 
a proposed system of tabulation wherein 10,000 or more facts—an un- 
limited number, indeed—may be recorded and statistically tabulated 
into two-way tables for a population, limited, practically, at 40,000 
persons; of statistical schemes for upbuilding the personnel and teach- 
ing efficiency of the college teaching body; of developments in statistical 
methods necessary for research in this field—generalized regression 
equations, generalized point hour ratio formulae, formulae for obtaining 
the average of any number of intercorrelations without solving any of 
the correlations; of correlation methods which suffice to produce, by 
Hollerith machine, and to check twice over some 3,000 correlations of 
250 cases each in two days; of a specially constructed Hollerith machine 
which will automatically obtain the constants for the solution of all 
moments up to the ninth; of test scoring machines; of a method of scor- 
ing a 300 question examination in a second on the printing press; and of 
a machine which solves regression equations and automatically makes 
vocational aptitude predictions while you go to dinner! 











if 














Proceedings 
UNIFORMITY IN DEFINING, RECORDING AND 


i REPORTING STATISTICAL ITEMS 


By Frank M. PaI.uirs 


In anticipation that the other papers would deal more or less with 
the technical processes involved in the statistical method because of 
their application to special fields, this paper is designed to cover the 
more general statistical phases of public school education. The clamor 
for uniform records and reports in the field of education has been 
going on for more than half a century. In 1871 the National Education 
Association appointed a committee to secure uniformity in city and 
state school statistics. The report of the United States Commissioner 
of Education a few years later, says: ‘“‘There are still some localities 
which are unable to furnish adequate statistics.’ Although consider- 
able progress has been made since that time, the statement contains 
as much truth today as it did fifty years ago. In 1912 the National 
Education Association and the United States Bureau of Education 
worked out, and issued jointly, bulletins on uniform records and reports 
which were given a rather wide distribution, and which served a most 
useful purpose in arousing not only interest but action where laws and 
customs permitted adoption of the report. 

During this period, however, those who were using educational sta- 
tistics began to demand more analysis and better definitions. In the 
meantime new educational situations arose which created a need for 











ee TS 


types of records and reports that did not exist in the early days. 
Moreover, the expanding educational program created new demands 
upon those responsible for the collection of educational statistics; so 
much so that in 1927 committees from the Department of Superin- 
tendence and the National Association of Public School Business Officials 
agreed to codperate with the United States Bureau of Education in a 
revision of the 1912 report. This new committee’s report was pub- 
lished jointly by the National Education Association and the Bureau 
of Education! during the latter part of 1927 and the early part of 1928. 
A recent inquiry among the states reveals the fact that very few lo- 
calities find any legal obstacles to the adoption of the committee’s 
report. Thirty states say no obstacles whatever; three others say very 
few obstacles; while three more say no legal obstacles but that their 
statistical machinery cannot yet furnish records of educational statis- 
tics according to the type v: school. 


1 National Education Association, Research Bulletin, Vol. 5, No. 5; United States Bureau of Education, 
Bulletin No. 24, 1928. 



















182 American Statistical Association 


Those of us who have been interested in per capita costs have long 
recognized the fact that the items entering into the computation of per 
capita costs are not always comparable. There is an old saying that 
“‘a pupil is always a pupil but a dollar is not always a dollar,’’ having 
in mind the fact that the purchasing power of the dollar varies consider- 
ably over a period of time. A little study, however, reveals the fact 
that the definition of the pupil in the different states lacks uniformity, 
as will be brought out later in this discussion. Those of us who have 
been interested in making a rank of states, or of communities, educa- 
tionally, have discovered that some of the fundamental items which are 
either necessary or desirable in establishing an educational rank for 
different localities, vary considerably in these different localities. 

One of the chief functions of statistical manipulation is that of fur- 
nishing assistance in making comparisons. When two administrative 
units are to be compared these units should be approximately alike so 
far as causative factors are concerned with the exception of the item 
upon which the comparison is based. If those factors which enter into 
the results vary considerably, the more technical statistical processes 
are necessary in order to evaluate them and eliminate their effects. _ If, 
for example, a comparison is to be made upon the basis, length of day, 
between a plant operating eight hours a day and another ten hours a 
day, other factors entering into the output of these two organizations 
should be as nearly alike as possible, in order that any differences which 
may be discovered in the output of the two plants may be chargeable 
to the difference in the length of the day. In comparing per capita 
expenditures in two different institutions the items which enter into the 
cost of running these institutions should be practically identical, so 
that any difference found in per capita costs cannot well be charged 
to expenditures for certain items which the investigator overlooked. 
States cannot well be ranked educationally unless the items used in 
making the rank have pretty much the same meaning, and have had 
applied to them about the same educational yard stick in each state. 
While the statistical method has furnished us with tools for the elimi- 
nation of factors which vary but which are accurately measured, there 
is no mathematical hocus-pocus through which figures can be put to 
eliminate blunders, errors of computation, and mistakes in the concept 
of definitions which are fundamental in making comparisons. 

It has always seemed to me that the definitions of the fundamental 
items, such as the school, the teacher, the pupil and the length of ses- 
sions, might well have a uniform description, so that school men in 
different localities might be speaking the same language. The school, 
as a unit, because of the various concepts of the school, does not lend 























Ps 


" Ce ee a ee 
li Sai DTS i al sah Nh isis, rs lines 


























































Proceedings 183 


itself to statistical treatment. In nine states the school is defined as a 
classroom, including a teacher and her pupils; two states specify that 
this does not apply to the high school; and three specify that this de- 
scription includes a unit for which a single register is kept. Two states 
describe the school as a unit for instruction purposes; five, as a group of 
grades, having in mind the admonition that a kindergarten, an ele- 
mentary school and a high school housed in the same building should 
be reported as three separate schools. Twelve states refer to the school 
as a building, and six others as an organization under a single principal. 
Five states describe the school as a district or a sub-district. One state 
does not know its definition of a school; in another it is not defined, and 
reports from nine are lacking. A sub-district then, with two eight- 
room school buildings would in some localities have one school; in 
other localities two schools; and in still others, sixteen schools. These 
differences in definition make it utterly impossible to give any sum- 
marized notion of the number of schools in this country, and render it 
extremely difficult to make any comparison between different localities 
by schools. 

The concept ‘‘teacher” likewise varies considerably in different 
localities. In eleven states only those giving instruction are included, 
and two of these specify that those included must teach at least one- 
half of the time. In five states librarians are included with class-room 
teachers, one of these specifying librarians in high schools. In ten 
states ‘‘teachers’’ include principals, supervisors and superintendents, 
one specifying, “provided they teach half the time,’’ and one, “‘if they 
give instruction.” Three of these include librarians if they give in- 
struction. Three states classify with teachers, librarians and nurses, 
and two of these, school clerks as well. 

Among the forty-two states for which information is available on 
the definition of a teacher, nineteen include librarians; fourteen include 
principals; thirteen, supervisors; eleven, superintendents; five, nurses; 
four include the clerical force; three, the attendance officers; three, the 
stenographers; one, the dean of girls; one, the dental hygienists; and 
one, the janitors. Apparently the definition of teacher lacks uniformity 
in the different localities, and this item needs considerable foot-notation 
if comparisons are to be made between localities in which the concept 
“teacher” is important. 

The school is organized presumably for the education of our youth. 
In higher education we call them students, and in the elementary and 
high schools we call them pupils. Naturally the states might differ in 
their enumeration of those available for school attendance, but there 
might at least be some uniformity in the definition of an enrolled pupil. 


























184 American Statistical Association 





Thirty-six states announce the pupil enrolled when he presents himself 
for instruction. Two others specify that he must actually receive 
instruction. Two specify that he is enrolled when he is of compulsory 
school age. There are, however, some exceptions to this definition 
In Pennsylvania, for example, the regulations say: 


No child shall be counted as not belonging to the school, unless, upon investi- 
gation by the local attendance bureau or the proper school official, it shall be 
found that the child, (1) is deceased, (2) has moved from the district, (3) is 
enrolled in another school, (4) is legally employed, (5) is 16 years of age or over, 
and has withdrawn from school, (6) has been certified by the medical inspector 
and found to be not a fit subject for education, (7) or holds a certificate showing 
the completion of the work of the elementary school, resides two miles or more 
from any high school and transportation is not provided. 


If all the localities preserved the lefinition of the enrolled pupil to 
the letter, it would be extremely difficult to make comparisons between 
a group of states where the child must actually present himself to be 
enrolled, and another group where he is enrolled automatically as soon 
as he becomes of compulsory school age. 

More agreement exists upon the definition of the attendance of a 
pupil but there is considerable difference of opinion as to when he 
should be dropped from the roll. In eleven states he is dropped when 
he has permanently withdrawn; in seven states, at the end of a semester 
or a year, provided he remains absent; and in two states, at the end of 
the month. In four states he is dropped after ten consecutive days of 
absence; in one, after six days; in another, after five days; and in six 
others, after three days of absence. Practices differ among the lo- 
calities in six states; two have no information concerning when pupils 
are dropped, and nine reports are lacking. 

In twelve states the attendance is counted by half days, deduction in 
four states being made for an absence in excess of one-fourth of a day, 
while in one of these states tardiness for one hour or more is deducted 
from the attendance. In six states attendance for one-half of the day 
or more counts as a full day; in two states no deductions are made for 
absence for a fraction of a day; in one, no deduction is made for less 
than one-half day’s absence; in one, deductions are made only for quar- 
ter days; and in still another, the pupil is counted present for the full 
half day if he attends as much as one hour of that session. In three 
states the pupil must be present all day to be granted a full day of 
attendance, and deductions are made for absences of all kinds. In 
five states no deductions are made for a fourth of a day, or less, of 
absence, two specifying that the pupil must be excused by his parent or 
guardian. In one state attendance for any part of the day counts for 














nself 
selve 
sory 
tion 


resti- 
ll be 
3) is 
over, 
ctor 
wing 
nore 


to 
Pen 


On 


=] 


oe 
_ 


—~ = ie 











Proceedings 185 


the full day. In two states the practice varies in different communi- 
ties,~«d in another, deductions are made for tardiness or for absence 
amounting to one-quarter of a day or more. 

These concepts or practices concerning attendance vary all the way 
from making presence for any part of a day count as a full day, to mak- 
ing full day attendance only, count as a full day. 

Considerable difference of opinion seems to exist among the several 
states concerning the days to be included in the regular school year. 
It is the practice in fourteen states to count only the days when schools 
are actually in session. Five states include teachers’ meetings in addi- 
tion to the days schools are in session, one state allowing as many as 
four days for teachers’ meetings and field meets. Seven states include 
such holidays as happen to fall upon’ the days when the schools are in 
session, and presumably count all. apils present on those days. Nine 
states include days both for teachers’ meetings and for holidays, two 
states specifying “‘provided these days are authorized by the Boards 
of Education.”’ The practice varies in one state, with thirteen states 
not reporting. Average daily attendance data are generally consid- 
ered to be the most reliable data of any submitted by the various 
states, although the figures may vary as much as 3 or 4 per cent because 
of the different practices involved in the computation of the average 
daily attendance. 

Communities differ also in their definition of the child’s age, al- 
though perhaps no statistical unit in education lends itself to more 
accurate measurement than does the age of a child. Within the same 
state age may be counted differently for census and for enrollment pur- 
poses, with another definition for age-grade and intelligence studies. 
Some twenty-nine states define a six-year-old child as one who has 
passed his sixth birthday but who has not yet reached his seventh. A 
number of others use this same definition but specify a time within the 
year when this condition must exist: in one state it is the opening of the 
school; in one, it is September first; in another, it is October first; in 
another, two months after the opening of the school; in three others, 
January first; and in one, by the thirtieth of April. 

The time of taking the school census varies considerably in the 
various states, and age as shown by the census figures may therefore be 
expected to vary considerably. A large majority of the schools, how- 
ever, open at practically the same time in the year, and for enrollment 
purposes it ought to be possible to reach an agreement concerning the 
definition of the age of a child. A variation of a half year at age six 
is a variation of over 8 per cent. 

It is possible to extend this discussion of definition indefinitely by 








186 American Statistical Association 


using items that are less fundamental in school records and reports 
than these which have already been mentioned. While the differences 
in definition are not all within the child-accounting phases of educa- 
tion, it seems to me that the majority of them are. More uniformity 
exists in the records and reports which deal with finances and other 
property items. Either the business officers have been more alert, 
or the accounting of finances has been more nearly standardized, but 
differences in the definitions of the finance items are much less pro- 
nounced than they are among items which deal with child counting. 

The manner of distributing State Aid has an influence in determining 
the volume of certain statistical items. When this aid is distributed 
upon the basis of the school census, some localities will find the names 
of a large number of children for the census list. When the distribu- 
tion is upon the basis of enrollment, some districts will have an over- 
sized school register. When it is upon the basis of attendance, some 
schools will make an excellent showing in attendance. The state 
cannot well check up on all its communities, since the contagion spreads 
over a wide area of exposure, and the tendencies become epidemic. 

A real school man does not solve problems of this nature; he only 
states them, studies them a while and then says that this is a good 
field for future investigation. Problems presented in this paper are 
not solved; they are not all stated. Only a few of them have been 
pointed out. Excellent work has been done by the committees re- 
ferred to, and their reports are available and have been placed in the 
hands of state officials. The adoption of the reports by the various 
communities is not within the province of the committees. The possi- 
bility of a careful and minute study of the whole field cannot well be 
done by general committees, whose personnel is more than busy with 
problems of its own. It is my belief that a small group devoting full 
time and paid for full time, might, within the space of two years, be able 
to work out a plan for harmonizing the differences in definition which 
exist in the various communities, and ought to be able to present in a 
convincing way the necessity of uniform reports and uniform records 
from which reports are made, and make recommendations which school 
men might generally accept and urge for adoption. 








™ 


oe So ES, | 


SE eon 0 1 BE ce A on a ce ew Eh ee we | 














Proceedings 187 


SUMMARY OF PAPERS PRESENTED AT THE LUNCHEON 
ON SOCIAL SCIENCE ABSTRACTS 


By F. Sruart CHapPin 


Dr. W. C. Curtis, of the University of Missouri, and in charge of the 
Division of Biology and Agriculture of the National Research Council, 
spoke upon the experiences of the chemists and biologists in the de- 
velopment of Abstracts’ services in these sciences to serve as tools of 
research. 

Professor Niles Carpenter, Department of Sociology, University of 
Buffalo, spoke on the subject of ‘‘ Social Science Abstracts as a Tool of 
Research.” He stressed the indispensability of the Abstracts as a 
means of keeping up to date with the scientific literature of the field. 
In this connection he analyzed the 277 entries devoted to sociology in 
the December, 1930, issue of the Abstracts, assuming that the Grosvenor 
Library of Buffalo, New York, was typical of university libraries in 
general and that the issue analyzed was a fair sample of the Abstracts. 
He concluded that the sociologist who fails to use the Abstracts is miss- 
ing approximately seven-eighths of the available current literature. 
If, however, the library mentioned is not accepted as typical, the Ab- 
stracts still remains indispensable because the issue analyzed cited 163 
references in sociology in foreign languages, chiefly from German, 
French, Russian and Italian. Further than this, the Abstracts enables 
the sociologist to become familiar with significant developments in re- 
lated fields with a minimum of effort and expenditure of time. Pro- 
fessor Carpenter suggested by way of improvement that the Abstracts 
should contain a greater amount of statistical material, even occasional 
tables or graphs of outstanding significance. 

Professor Esther Cole, Department of Political Science, University 
of Kentucky, continued the discussion. She stated that Abstracts 
serve for her three major purposes: (1) they have proved to be a con- 
stant aid in teaching university classes in political science; (2) they 
have enabled her to keep in touch with current materials in her own 
field which are not otherwise available; (3) they have brought an ever- 

broader realization of significant relationships between the various 
social studies. She stressed the value of the Abstracts to scholars whose 
remoteness from centers where bureaus of research are located is a 
handicap in keeping up with periodical literature. In this connection 
Social Science Abstracts serves as a central clearing house for the worker 
in a specialized field. Highly technical studies are included in the ab- 
stracting chiefly by title or by a concise descriptive sentence. The 








188 American Statistical Association 


Abstracts recognizes the twilight zones between the various fields of 
knowledge, and urges the scholar to approach his own problem in the 
light of these interrelations. Consequently, consistent and fairly 
thorough reading of the Abstracts is found to be illuminating. 

Professor Hornell Hart, Department of Sociology, Bryn Mawr Col- 
lege, stated that though there was a time when an apprentice in social 
research would seek out some master in his field and would learn the 
traditions and techniques of his specialty from him, we now have begun 
to recognize that no single Aristotle, Comte, Spencer or Ward can fur- 
nish adequate contacts with developments in social science. He said 
that scientific thinking is not confined to the countries which speak 
French, English and German but intellectual codrdination can only be 
effective on a world scale. Social Science Abstracts has made it pos- 
sible for the modern sociologist to keep in touch directly with the 
greater part of the matter published in English in his special field, and 
also in the other languages. In concluding, he stated that the sociolo- 
gist who fails to read, clip and file his personal copy of Social Science 
Abstracts is comparable to a man attempting to go into modern business 
who feels that he cannot afford to install a telephone. 

Professor Susan Kingsbury, Department of Economics, Bryn Mawr 
College, said that the Abstracts might advantageously cover more Ger- 
man periodicals, judging by the December, 1930, issue as a sample. 
The American scholar would depend upon the Abstracts for inclusive- 
ness of foreign journals. This would be particularly important with 
reference to Russian periodicals. Dr. Kingsbury felt that the Ab- 
stracts had not yet succeeded in reaching much of the important Rus- 
sian material in the field of economies and sociology. She felt that the 
service for foreign languages is more important than that for English 
and that the Abstracts should work towards greater inclusiveness of 
foreign journals. 

Professor Royal Meeker, of New Haven, said that he realized the 
tremendous importance of placing summaries of all important articles 
in the hands of students, teachers and researchers in the social sciences, 
but was more than skeptical about the possibility of ever achieving suc- 
cess in this task. Dr. Meeker stated that he hailed the proposal with 
delight but with many misgivings, for he was assailed with doubts. 
He expressed his gratification to members of the staff of Social Science 
Abstracts for their brilliant success in establishing this journal. He con- 
cluded by saying that there is a danger in the Abstracts, as there is in all 
publications, and that is that it lures the researcher to neglect his own 
work in order to read the impossibly vast literature available about the 
work of other researchers. 








ees - 











~~ ew — ft - 











Proceedings 


THE FEDERAL UNEMPLOYMENT CENSUS OF 1930 


By Mary vAN KLEEcK 


Proposals for a census of unemployment in the United States, to be 
taken in connection with the Federal census of population in 1930, 
were made officially by the Senate Committee on Education and Labor 
in the report of its investigation of unemployment in March, 1929. 
The Committee’s first recommendation was the “continuous collection 
and interpretation of adequate statistics of employment and unem- 
ployment.” The basic need in such a statistical system was the 
establishment of a “norm,” whereby to appraise the significance of 
sample measurements of shrinkage and increase in employment and 
unemployment, such as are made by the Federal Bureau of Labor 
Statistics and by state departments of labor. The establishment of 
this norm was to be achieved by a complete census of unemployment, 
for which the decennial census of population seemed to offer a golden 
opportunity.! 

The purpose of this paper is to examine the unemployment census in 
the light of its intended use—a basic, inclusive count from which to 
measure current statistics of change in employment, to the end that 
government and industry might be equipped with knowledge of the 
extent of unemployment, as a guide in constructive measures of relief 
and prevention. 

Preliminary Results of the Census of Unemployment.—The census was 
taken in April and May, 1930. At first it was intended to give out no 
data until the whole could be tabulated in Washington, but public 
interest led to plans for a hasty count of the number of persons enumer- 
ated as usually employed but without a job the day preceding the 
enumeration, though able to work and looking for work. On June 26 
the Department of Commerce issued to the press the following state- 
ment, under the title “Preliminary Returns from the Unemployment 
Schedule”’: 


According to preliminary returns covering about one-fourth of the population 
of the country, the total number of persons usually having a gainful occupation, 
who were reported at the time of the census in April as having no job, although 
able to work and looking for a job, amounted to 574,647, or 2.0 per cent of the 
total population of the territory covered, which numbered 29,264,480. The 
figures cover 756 counties and 75 cities not included in these counties. 


“Assembly of Statistics on Unemployment Essential, Declares Senate Committee,’ United States 
Daily, March 8, 1929, p. 2. 








190 American Statistical Association 


Attached to the statement of which these sentences were the sum- 
mary, was a comment by the Secretary of Commerce: 


I have inspected returns from the localities from which these figures originate. 
They appear to be a representative sampling of the country. The figures applied 
to the whole population would indicate much less unemployment than was gen- 
erally estimated; these unemployment figures also include normal unemployment 
of persons shifting from one job to another. Since the time of this census there 
has been the usual increase in employment in various seasonal occupations. 


The statement from the Census Bureau contained the caution: 
“Figures are to be given out later for persons who reported that they 
had a job but were sick or temporarily laid off; and for those usually 
employed but physically unable to work at the time of the census, etc.” 

The first returns for the country as a whole were issued in Washing- 
ton on August 23. Three days earlier a statement had been given out, 
explaining for the first time the Bureau’s method of classifying the 
workers reported idle on the day preceding the enumerator’s visit. 
Class A, which was the only one covered in all the returns so far given, 
was defined as including “‘ persons out of a job, able to work and looking 
for a job.” In the entire continental United States they numbered 
2,508,151,' or 2.0 per cent of the total population of 122,698,190. 
Class A was held by the census officials in Washington ‘‘ undoubtedly” 
to “constitute by far the major part of the total number of unemployed 
under any definition that might reasonably be adopted.” 

Of Class B it was at the same time stated that it 


will be made up in part of persons who are working on part time or who are idle 
for relatively short periods for seasonal or other temporary reasons, though it 
will include also many persons who have been laid off from their jobs for relatively 
long periods, some of whom are in very much the same economic position as the 
jobless in Class A. Class B will be particularly important in certain counties 
where coal mining is a prominent industry, since the coal miners tend to report 
that they still have a job, even though they have done no work for a fairly long 
period. Several of the other classes are composed mainly of persons who would 
not, even under the most elastic interpretation of the term, be considered unem- 
ployed. The schedule questions were made to include all persons usually 
working at a gainful occupation but not at work on the last regular working day 
preceding the enumerator’s call, however, in order to make sure that no persons 
actually unemployed should be omitted by reason of the enumerator’s misunder- 
standing of a more restricted definition. 


1 The statement of June 26, that the population of the first area was one-fourth of that of the whole 
country (later found to be less than a fourth), had led to multiplying by 4 the census statement of the 
number unemployed, giving a total of 2,298,588. Though this figure was not given by the Census, the 
official announcement led naturally to its computation, which proved to be an underestimate. More- 
over, there was an error in the total count of the population, which was overestimated in this area. 








] 








te. 
ed 


nt 
re 











Proceedings 191 


Criticisms of the Census —From the moment when the first prelimi- 
nary returns were announced on June 26, the census has been challeaged. 
The New York State Industrial Commissioner called them premature 
and not designed ‘‘to give a true picture.” In so far as New York 
State was concerned, they gave too great weight to rural areas. More- 
over, to give the percentage in terms of the aggregate population was 
not a significant figure. The important fact was the percentage of the 
wage-earning population, and it would have been better to have with- 
held the figures a month or two for more complete data.' 

Some newspaper and magazine comment was particularly severe. 
The New York World, on July 11, 1930, said: ‘“‘The Department of 
Commerce is again offending common sense by putting forth alleged 
figures of unemployment.” The New Republic, on August 20, com- 
mented on the “attempt (of the Census Bureau) to minimize the num- 
ber of the unemployed,” and even concluded that “‘it is impossible not 
to infer that the administration was glad to have the underestimates 
pass as good currency.” 

The resignation of Dr. Charles E. Persons from his position as 
expert on unemployment in the Census Bureau brought to light his 
view of what he regarded as inadequacy in counting the unemployed. 
In a signed article in the New York World, he objected to the impres- 
sion given that data for the jobless only, gave a fair measure of unem- 
ployment, and he made a number of other criticisms of the validity of 
deductions from the areas included in preliminary announcements, the 
scanty representation of industrial states in the first returns, and an 
overestimate of the proportion of the total population included in this 
sample. He held that all of these factors reduced the total estimate of 
unemployment. 

A former Commissioner of Labor Statistics, Dr. Royal Meeker, 
issued a statement which concluded: 


A careful analysis of this first census report (of August 23) indicates that while 
the figures are probably correct, they invite misunderstanding and have been 
grossly misinterpreted because of the narrowly limited definition of unemploy- 
ment used in the census. Those who use these statistics should know what 
they mean. 


These criticisms are cited to show the importance of a clear analysis 
now of the whole procedure of the Bureau of the Census in taking the 
census of unemployment and in compiling the results. 

Procedure in Planning and Taking the Census.—In the summer of 
1929, following the enactment of the law which provided for the taking 


1“ Census Statistics on Employment Are Challenged,”’ United States Daily, June 30, 1930, p. 1. 
? New York World, August 11, 1930. 














192 American Statistical Association 





of the unemployment census, the Bureau appointed an advisory com- 
mittee to confer regarding the schedule.' An economist was appointed 
as expert on unemployment in the division of the Bureau charged with 
taking the census of population. A special section was set up to code 
and tabulate the returns. All of these steps were a recognition of the 
special problems presented in this part of the census. 

Questions on the Schedule.—Entries on the population schedule 
showed the occupation and industry of the gainfully employed and 
whether or not the person was at work the last regular working day 
before the enumeration. For the gainfully employed who were not at 
work on that day, entries were to be made on the unemployment 
schedule. The following outline shows the headings related to unem- 
ployment on these two schedules: 


POPULATION SCHEDULE 
(Questions Related to Employment and Unemployment) 


OccuPATION AND INDUSTRY: 

Column 25—Occupation—Trade, profession, or particular kind of work, as spinner, 
salesman, riveter, teacher, etc. 

Column 26—IJndustry—Industry or business, as cotton mill, dry-goods store, shipyard, 
public school, etc. 

Employment—W hether actually at work yesterday (or the last regular working day). 

Column 28—Yes or No. 

Column 29—If not, line number on Unemployment Schedule. 


UNEMPLOYMENT SCHEDULE ? 


(To be Used in Connection with the Population Schedule) 


Column 4—Name—of each person who usually works at a gainful occupation but did 
not work yesterday (or on the last regular working day). 

Column 5—Does this person usually work at a gainful occupation? Yes or No. 

Column 6—Does this person have a job of any kind? Yes or No. 


Ir Tuts Person Has a Jos: 


Column 7—How many weeks since he has worked on that job? 

Column 8—Why was he not at work yesterday? (Or in case yesterday was not a 
regular working day, why did he not work on the last regular working day?) 
For example, sickness, was laid off, voluntary lay-off, bad weather, lack of materials, 
strike, etc. 

Column 9—Does he lose a day’s pay by not being at work? Yes or No. 


1 The membership of the advisory committee was as follows: Joseph H. Willits, University of Penn- 
sylvania; William Green, American Federation of Labor; J. Chester Bowen, Federal Bureau of Labor 
Statistics; W. A. Berridge, Metropolitan Life Insurance Company; George E. Roberts, National City 
Bank; L. W. Wallace, American Federated Engineering Societies; E. Dana Durand, Bureau of Foreign 
and Domestic Commerce. 

? Certain columns which were designed for coding purposes and for connecting with sheet and line 
number on the population schedule are omitted. 

















Proceedings 


Column 10—How many days did he work last week? 
Column 11—How many days in a full-time week? 


Ir To1s Person Has No Jos or Any Kinp: 

Column 12—Is he able to work? Yes or No. 

Column 13—Is he looking for a job? Yes or No. 

Column 14—How many weeks has he been without a job? 

Column 15—Reason for being out of a job (or for losing his last job) as plant closed 
down, sickness, off season, job completed, machines introduced, strike, elc. 


The Definition of Unemployment.—Census officials explain that they 
have refrained from giving a definition of unemployment. They have 
sought to have an enumeration made of those not at work on a given 
day, leaving to the users of statistics the decision as to which groups 
they would include for a particular definition. 

As the outline of the schedules shows, the starting point of the 
unemployment count was the answer “No” in Column 28 of the 
population schedule for a person for whom the occupation and industry 
had been recorded. A further check on the occupational status was in 
Column 5 of the unemployment schedule, ‘‘ Does this person usually 
work at a gainful occupation?’’ A negative there, unless obviously an 
error in the light of supporting data on the two schedules, would result 
in throwing out the unemployment entry. Thus the first effort was to 
eliminate those not usually gainfully employed, including young per- 
sons who have not yet begun to work for wages, though looking for 
work, old persons who have retired, and “‘those who, for any reason, 
decline to work or choose not to work.’”’ The gainfully employed 
included all engaged in an occupation yielding any income, whether or 
not the work was regularly part time, even as little as a day or two a 
week. Those in business on their own account and independent pro- 
fessional workers were included with those working for an employer and 
were considered as employed on the specified day if they “put in time 
at their place of business.” 

It will be noted from the unemployment schedule that two main 
divisions were established between the person who “has a job”’ and the 
person who “has no job of any kind.” 

According to instructions to enumerators a person “has a job” if 
he “expects to return to his former job.” A definite contract was not 
necessary to establish this status. Building trades employees were so 
recorded 


if their customary employer has work in sight. And men temporarily laid off at 
a factory, mill, or mine are to be so returned if they expect to be taken on again 
in their former places. Difficulties will arise because of the length of the period 

















American Statistical Association 





194 


of idleness. Endeavor to ascertain whether there is reason to expect the closed 
plant to reopen, and if so, return the individual as possessed of a job.! 


If a person was recorded as having no job, and if “ Yes” was the 
answer to “Is he able to work?” (Column 12) and “Is he looking for a 
job?”’ (Column 13), he became one of the now famous Class A, “having 
no job, although able to work and looking for a job.” It will be 
observed that in many instances the dividing line was faint between 
this class and those who, while believing that they had jobs, were out 
of work, some of them for many weeks. 

Those having jobs were further asked how many days they worked 
last week, and how many days there were in a full-time week, thus 
giving a basis for determining what portion of the preceding week those 
out of work less than a week had been employed.2. Both groups were 
asked how many weeks they had been idle. 

The Basis for Groups Classification —The census has established the 
following classes of the unemployed: 


Class A—Persons out of a job, able to work, and looking for a job. 

Class B—Persons having jobs but on lay-off without pay, excluding those sick or 
voluntarily idle. 

Class C—Persons out of a job and unable to work. 

Class D—Persons having jobs but idle on account of sickness or disability. 

Class E—Persons out of a job and not looking for a job. 

Class F—Persons having jobs but voluntarily idle without pay. 

Class G—Persons having jobs and drawing pay, though not at work (on vacation, etc.). 


The basis for these classifications was partly in definite answers on 
the schedule and partly in the statement of reasons for idleness. Thus 
Class A was made up of those for whom the answers in Columns 12 
and 13 were “Yes.” It was the ease in counting these two adjoining 
columns, which led to the prompt announcement of numbers included 
in Class A. The assumption was also made by the Census Bureau that 
this was the most significant class—an assumption which may be 
doubted in the light of the returns for Class B. In determining all the 
other classes the enumerator’s statement of reasons for idleness was an 
important factor. From those returned as not having jobs, those 
unable to work were separated and counted in Class C, and those not 
looking for a job were counted as Class E. To form Class B, “persons 

1 Fifteenth Census of the United States, ‘‘ Instructions to Enumerators, Population and Agriculture,” 
Revised, p. 44. 

2 Some have assumed that this would give the number on part time, but the assumption is erroneous, 
since the count excludes part-time workers who were employed any part of the work day preceding the 
enumeration. Nevertheless, in the Census Bureau the intention is to present the data showing the num- 
ber who had worked less than full time in the preceding week and the proportion of working days which 


they were employed, in the hope of giving a basis for estimating the corresponding percentages of part- 
time workers who were excluded from the unemployment count. 








—_ it, me oo ob Go aia h6OllUlUlUR 








ed 








Proceedings 195 


having jobs” were counted if they were “on lay-off without pay,” 
excluding the sick or disabled, who were placed in Class D, and the 
“voluntarily idle, without pay,’’ who were placed in Class F. Those 
“drawing pay, though not at work (on vacation, etc.),”’ as indicated by 
the answer ‘‘ No” in Column 9, were placed in Class G. 

Reasons for idleness as a basis for this classification offer several 
difficulties. In the first place, the column headings on the schedule 
merely gave illustrations of probable reasons, for the guidance of 
enumerators. For instance, ‘“‘sickness” was among the illustrative 
reasons, but there was no requirement that in every instance the 
question should be put as to whether sickness was the cause of idleness. 
Thus it is not safe to say that Class D will show the extent of sickness 
as a cause of unemployment. It will show merely the number of 
instances in which the enumerators listed sickness or disability as a 
cause of idleness for those who had jobs. The reasons actually given 
require an alphabetical index of six and one-half pages of double 
columns in the book of coding instructions. For coding they are 
gathered into 12 groups and 48 subgroups. It is interesting material 
as descriptive of reasons for unemployment, but as a basis for counting 
numbers it seems clearly lacking in precision. 

Secondly, for a person having no job there was some confusion, even 
in the instructions, as to whether the reason for idleness should be the 
reason for losing the last job or for being unable to obtain a new one. 
In general, if unemployment had lasted for some time the enumerator 
was asked to record the reason for not being able to find a job at 
present. Obviously the answer to this question would not always be a 
clearly objective fact, since the worker does not always know the 
reason for his lack of success in finding a job. Third, in separating out 
from the group of persons having jobs those who were “voluntarily 
idle, without pay,”’ there was room for error. What, for instance, is a 
“voluntary lay-off,” and how does it differ from the preceding illustra- 
tive reason in the schedule, ‘was laid off’? The word “voluntary” 
was understood to refer to personal choice or personal reasons, but may 
not always have been clear both to the enumerator and to those inter- 
viewed. Fourth, the separation of Class E, “persons out of a job and 
not looking for a job,” from Class A, the jobless who were looking for 
jobs, is by no means justified in all instances. The implication is that 
those in Class E do not wish to work. Clearly, however, in towns 
having only one industry or even in larger cities where unemployment 
has been widespread, workers have ceased to look actively for work. 
Yet they are part of the problem of unemployment. 

Difficulties in Enumeration.—To those handling them the schedules 











196 American Statistical Association 


must have given a vivid picture of the diversity of economic status of 
workers in the United States. An enumerator in a southern state 
wrote a careful letter which her supervisor transmitted to Washington, 
commenting on her problems. She was enumerating farm laborers who 
make a living in seasonal activities, ‘‘chopping cotton” and gathering 
vegetables in the spring and picking cotton and breaking fodder in the 
fall. Generally they are paid in cash for this work, but their basic 
compensation is being housed on plantations, with privileges of gather- 
ing wood, in return for which they must be ready to work when needed. 
These are almost feudal conditions. At the other extreme of industrial 
development were 200 workers, mainly from steel mills, who were 
interviewed in a jail in a mid-western town where they sought commit- 
ment for vagrancy to get a night’s lodging—a vivid picture of insecurity 
in modern industry. 

It would lengthen this paper unduly to discuss the questions misinter- 
preted by enumerators and by persons interviewed, and the various 
ambiguities and inconsistencies arising out of the differences in individ- 
ual instances and out of the circumstances of enumeration. For 
correcting these, careful provision has been made in the procedure of 
coding and verification before punching the tabulating cards in 
Washington. A classification has had to be formed for rejected entries, 
such as the lack of a record on the population schedule, or the lack of 
an entry on the unemployment schedule for a person checked as un- 
employed in the population count, or the lack of a record of the usual 
occupation. Those engaged in coding and tabulating are encouraged 
to bring to their supervisors these difficult questions. Often an 
examination of the material as a whole makes it possible to supply 
missing answers and ‘correct inconsistencies. It should be observed, 
however, that necessary rejections reduce the count of the unemployed. 
Nor can that treasured asset of the statistician, compensating errors, be 
relied upon to restore the balance. 

Only when the returns are all tabulated will it be possible to measure 
the effect of rejections and to gauge the result of division into classes. 
At that time, too, some idea can be gained as to the effectiveness of this 
count of the unemployed in measuring economic distress in different 
regions. In farming communities many who were at work some part 
of the day preceding the enumeration and hence were counted as 
employed may have been as badly off economically as the automobile 
mechanic who had no work that day. The fact is that economic status 
has such different meanings in different occupations that to find a 
common measure is baffling. It seems clear that this count of the 
number of workers who lacked work and wages was most significant 




















Proceedings 197 


for those who ordinarily work full time for others, in the relation of 
employee to employer, in industries other than agriculture and the 
professions. 

Evaluation of the Unemployment Census.—For the first time in the 
United States all the gainfully employed have been counted as em- 
ployed or unemployed in the same relatively short period of time.’ As 
a base line from which to measure the significance of trends of employ- 
ment in the current employment statistics, the use of these data will 
depend upon whether the industrial groupings which are possible from 
the population schedule are fully tabulated. The percentages of idle- 
ness in these different industrial groups will then be a basis of com- 
parison. As the census was taken in a period of unusual depression, 
it will not be possible to regard it as a count of the ‘normally unem- 
ployed,” but the indices of employment and earnings currently com- 
puted by the Federal Bureau of Labor Statistics and by certain state 
bureaus will make it possible to place this count of April, 1930, in the 
scale of trends in employment. Of course, the old difficulty in labor 
statistics—differences in terminology in industrial classification—will 
require care and ingenuity in handling the material; but the obstacles 
will not be insurmountable, if the occupation census gives the data in 
the form needed for this and other purposes. 

In addition to the statistics resulting from the count, the data as a 
whole will give a basis for some illuminating studies of the incidence of 
unemployment in different groups—men as compared with women, 
Negroes as compared with white workers, workers under 35 as com- 
pared with workers over 35, and wage-earners in different types of 
communities, one-industry towns, large cities and rural areas. At this 
point it should be noted that the plan for tabulating and publishing 
the data from the population schedule will determine the possibilities 
for these further studies of unemployment. For example, smaller age 
groupings will be needed than in past censuses, where the interval 
above 34 has been ten years. The unemployment schedules are being 
tabulated at five-year intervals from 10 to 70. 

In short, though final appraisal must wait on complete publication, 
this census of unemployment, with all its inherent difficulties, has given 
data, hitherto wholly lacking, upon which to make more accurate cur- 
rent estimates and to build more significant sample studies than have 
been possible in this country before. When the proposed sample count 
has been made on January 15, 1931, in 20 cities, further useful data will 


1In past censuses in 1880, 1890, 1900 and 1910 the effort has been made to discover how many were 
unemployed and for how long in a whole year, and although the data available are interesting and are 
being reviewed now in the Bureau as background for the present census, they could not supply the 
“norm” which was the purpose this year. 











198 American Statistical Association 





be added to the 1930 material. The plan is to use the same schedule 
and to employ, as far as possible, the same enumerators. This should 
insure a fair measure of comparability. 

Meanwhile it is possible to point out certain disadvantages in the 
present count. 

(1) The most serious omission was the lack of a count of those on 
part time either by the day or by the week. To make this possible, 
those who were regularly on part time should first be counted out, and 
then it would have been necessary to make a count covering a week 
instead of a single day, or in some way to provide for those workers on 
reduced time who happened to be at work on the day of the enu- 
meration. 

(2) The important use made of reasons for idleness suggests that the 
whole procedure in taking the enumeration at this point needs careful 
examination. The question is raised as to whether certain main 
categories—such as (a) those idle for personal reasons, (b) those idle 
because of sickness or disability—might not have been established and 
an answer required in all instances, leaving as a residue idleness which 
might be said to be due to industrial and economic causes. Certainly 
the illustrative reasons might have been more wisely chosen. The 
difficulty here was that the whole subject of reasons for idleness is in a 
sense a piece of investigation rather than a count, and the enumerators 
could hardly have been expected to be equipped for investigation unless 
the questions were so phrased as to be as nearly as possible uniform in 
their interpretation. 

(3) The separation of those working on their own account, the inde- 
pendent professional workers not on salary and certain other groups 
not having the status of wage-earners, was recommended by the Com- 
mittee on Governmental Labor Statistics. The census, however, 
included them all on the theory that it was better not to depend upon 
the enumerators to make these distinctions. 

(4) The emphasis put upon Class A, those without jobs, and the 
distinction maintained between this group and Class B, who were said 
to have jobs, though workless and wageless, has been confusing to the 
public and has exposed the Bureau to the criticism of seeking to 
minimize the seriousness of unemployment. Moreover, the separation 
even from Class B of those on “voluntary lay-off” and other more or 
less obscurely stated reasons which were held to be evidence of ‘“‘ volun- 
tary idleness” has not had a precise enough basis to justify placing 
much weight upon these classifications. 

(5) A final criticism relates to the interpretation given to the data 
by administrative officials. The statement by the Secretary of 























Proceedings 199 


Commerce, already quoted, told the country that after examination of 
the preliminary returns in June, 1930, ‘‘the figures applied to the whole 
population would indicate much less unemployment than was generally 
estimated.”” The President, in his message to Congress in December, 
stated that ‘‘the number of those wholly out of employment seeking for 
work was accurately determined by the census last April as almost 
2,500,000.” Both of these statements were, of course, based upon the 
one group designated as Class A. The announcements made by the 
Bureau of the Census have lacked clarity on these points and have laid 
the Bureau open to the criticism of having permitted an underestimate 
to go to the country at the very moment when programs of relief should 
have been under way in many communities. Industry also needed the 
information as a guide. 

What the Partial Returns from the Unemployment Census Have Shown, 
December, 1930.—Preliminary returns published August 23 showed 
2,508,151 persons without a job, able to work and looking for a job. 
This represented 2 per cent of the total population. Data for cities of 
100,000 or more, announced September 4, showed considerably more 
than 2 per cent out of work in the great majority of these cities. Par- 
tial returns from one or two states, available in December, showing 
industrial groups, revealed still greater differences between industries 
in the percentage of total unemployment as defined in Class B. The 
first statement to make, therefore, about what the census has shown is 
that generalizing for the country as a whole has very little significance. 
The important point is to concentrate attention upon industries and 
localities in which unemployment has been shown to be most 
severe. 

Only partial returns have been published to give a basis for estimat- 
ing the comparative size of the groups in different classes. On De- 
cember 13, returns for 25 states, the District of Columbia and the 
cities of Buffalo, Philadelphia and Rochester showed that Class B, the 
involuntarily idle and wageless, though having jobs, constituted for 
that area a group equal to 33.3 per cent of Class A. If this percentage 
be applied to the 2,508,151 counted as in Class A, it will add 835,214, 
giving a total of 3,343,365 who were unemployed and without pay on 
the working day preceding the enumerator’s call and who were not 
voluntarily idle, sick or disabled. It has already been indicated that 
some of this class were on reduced time, but there is no way of dis- 
covering how many. 

It should be added that the Census Bureau, in making this an- 
nouncement of December 13, indicated that the areas included ‘“‘are 
not quite typical of the country as a whole, but include many areas in 











American Statistical Association 

























200 





which unemployment is less serious than in some of the states for 
which the detailed tabulation is yet to be completed.” 
Table I shows the preliminary returns for these areas by classes. 











TABLE I 
Total for 25 states, 
Class District of Columbia 
and three cities ] 
OE ee 42,858,298 
Class A—Persons out of a job, able to work, and looking for a job... 567,540 
Per cent of papuiation LAE AE PEEL IT EL LR ET OI 1.3 
Class B—Persons having jobs but on lay-off without pay, excluding 
ne nn cv esos bececersccsbasesnesecen 188,870 
ae ca cee Cee N ewe adage enna a eene 0.4 | 
Class C—Persons out of a job and unable to work................ 49,453 
Class D—Persons having jobs but idle on account of sickness or dis- 
sahara ia he ae kale sca hie al as deel aa in Go Ae we a O4 87,078 
Class E—Persons out of a job and not looking for a job........... 26,103 
Class F—Persons having jobs but voluntarily idle, without pay..... 30,558 
Class G—Persons having jobs and drawing pay, though not at work 
SE 3c ad cn din hpeetenartenweata baws Swe ewe haw eben 32,262 











We may at once omit from consideration Class G, who were drawing 
pay, and Class F, who were voluntarily idle, though, as already indi- 
cated, this may have included some who were laid off for industrial 
causes. If we take half of Class E, on the assumption that some of 
these had retired from occupations (though the population schedule 
should not have counted them as gainfully employed), and add the 
resulting number, 13,051, to Classes C and D, we have a total of 
149,577 in this area, or 26 per cent of Class A, who were out of work 
and out of wages and must therefore be considered in any program of 
relief. If the same proportion applies to the country as a whole, it 
will add to the count of the unemployed already given 652,119, making 
a total of 3,986,484 who in April, 1930, were out of work and out of 
wages for all causes, excluding the voluntarily idle and those who were 
drawing pay. It would appear that all must be considered in any 
program of relief and prevention. 

















Proceedings 


STATISTICAL THEORY OF EVOLUTION 


By SewaLu WRIGHT 


A recent writer on evolution, Dr. R. A. Fisher, has made an interest- 
ing comparison between the position of the evolutionary principle in 
biology and the second law of thermodynamics in the physical sciences. 
He quotes Eddington to the effect that the law that entropy increases 
holds the supreme position among the laws of nature and notes that the 
principle of evolution holds a similar position among the biological 
sciences. Both describe irreversible processes and thus mark a direc- 
tion in time, the law of increase of entropy, according to Eddington, 
being unique among the physical sciences in so doing. Both are sta- 
tistical laws. Dr. Fisher notes, however, a remarkable contrast be- 
tween them. The operation of the second law of thermodynamics 
brings about a disorganization of the systems concerned, a passage from 
less probable to more probable states. Evolution, on the other hand, is 
nearly always described in terms of progress, a passage from simple to 
complex organization, from more probable to less probable states. 
Fitness takes the place of entropy in the formulation. 

Whether evolution is a mere eddy in a general process of running 
downhill in the universe, as Eddington doubtless would hold, or 
whether the developmental side of nature so conspicuous in the bio- 
logical sciences is an aspect of reality more basic than increase of 
entropy in physical systems, or whether time is essentially without 
direction one way or the other—as G. N. Lewis holds, at least with 
respect to the physical sciences—are philosophical questions which I 
shall not attempt to discuss. The comparison sufficiently brings out 
the difficulty of accounting for the evolutionary process on the same 
basis, statistical theory, as that which leads to the law of increase of 
entropy in the physical sciences. Yet it seems the only course open to 
scientific analysis. 

The first attempts at explanation, to be sure, interpreted the process 
as directly physiological rather than statistical. Lamarck assumed 
that the physiological adaptations of organisms to varying environ- 
ments produced parallel changes in the heredity which they trans- 
mitted. Experimental study of heredity and of development have 
shown that this interpretation is not available. The observed prop- 
erties of variation, for one thing, are as far as possible from those 
postulated by this theory. 

Darwin was the first to present effectively a sketch of a statistical 














202 American Statistical Association 





interpretation, the play of natural selection upon random hereditary 
variability. But practically nothing was known in Darwin’s time of 
the physical mechanism of heredity. He merely assumed that there 
was a tendency toward persistence of type, qualified by small random 
variations. Since the rediscovery of Mendelian heredity in 1900, the 
subject of heredity has changed from one which was a plaything of 
every speculative writer, biological or otherwise, to one in which we 
have as much exact knowledge as, perhaps, in any other field of biology. 
This knowledge has raised some difficulties of which Darwin was not 
aware. It includes, however, mathematically expressed rules whose 
statistical consequences, in populations, can be worked out with some- 
thing like the confidence that they correspond to realities, which we 
find in the physical sciences. 

The basic fact of modern genetics is, of course, that heredity is com- 
posed of units, “‘genes,’’ whose most essential property is that of dupli- 
cating themselves with most extraordinary precision (as determined 
by effects under controlled genetic and environmental conditions), 
quite regardless of the characteristics of the organism in whose cells 
they are carried. The effects on the developmental process, dependent 
on the general genetic and environmental situation, constitute second- 
ary properties. It is the property of autosynthesis of a doubtless 
highly complex material which makes possible the apparently disen- 
tropic aspect of evolution. Certain highly “improbable” states of 
organization are hereby multiplied instead of being dissipated as in 
ordinary thermodynamic systems. 

Absolute precision of gene duplication is, however, incompatible 
with evolution. The exceptions, so-called ‘‘gene mutations,” have 
been much studied of late. Their properties at first sight seem as far 
as possible from those required for progressive evolution. The typical 
rate of mutation for individual genes can hardly be more than 1 per 
million per generation. Direction of mutation has no relation to ex- 
ternal conditions, although at least one agent, X-rays, greatly increases 
the rate. Mutations are practically never adaptive. They are 
usually definitely injurious, although sometimes apparently indifferent, 
especially if very slight in effect. The low rate of mutation observed is 
probably about as much as species can stand in view of the prevailingly 
injurious effects. The observed time rate of lethal mutation in the 
vinegar fly, with two weeks between generations, would bring about 

extinction of the human species in one generation. 

I shall pass rapidly over two factors of the greatest significance in 
making progressive evolution a conceivable process upon such unfa- 
vorable material as described above. First is the aggregation of numer- 








ee ee ee ee ee ee 2 @ ee ee | 


ieee — pe a 2 lOelC Ml 

















Proceedings 203 


ous genes in the same cell, and the evolution of a mechanism of exact 
equational division of such an aggregation, the mechanism of mitosis. 
This is apparently absent in bacteria and blue-green algae but present 
in all higher plants and all animals, and is doubtless necessary for any 
high degree of organization. Associated with mitosis is sexual repro- 
duction, involving the union of half samples of the parental heredities. 
Biparental heredity makes it possible for a not too injurious mutation 
to be tried out in combination with all mutations carried by the species, 
and since it is really the combination and not the individual gene which 
is injurious or adaptive, it becomes possible that an initially injurious 
gene may find a place in an adaptive combination. Under uniparental 
reproduction, each mutation adds only one new type to the species. 
The occurrence of 100 different mutations means only 101 types. 
Under biparental reproduction each new mutation doubles the num- 
ber of possible gene combinations. One hundred different mutations 
means 2’ or about 10” potential types. Compare this with 101 and 
you will appreciate the enormously greater field of variation presented 
under biparental reproduction for the play of natural selection than 
under uniparenta! reproduction. The problem is for the species to 
hold its slightly injurious mutations until it can work its way in some 
way through the nearly infinite field of gene combinations to the par- 
ticular combination which will mark an advance. Whether this can 
be brought about by natural selection alone is a moot question. Dr. 
R. A. Fisher, to whom I referred at the beginning, has made a mathe- 
matical investigation which leads to the conclusion that natural selec- 
tion is enough, that such selection must inevitably lead the species 
along the road of increasing fitness even in the minutest detail, assum- 
ing that the environment does not deteriorate. I have been led to 
somewhat different conclusions. 

In considering the problem, it is necessary to start with a conception 
of the differences between species different from that which is perhaps 
mostusual. Weare likely to think of a natural species as an assemblage 
of individuals all homozygous for the same genes except for rare mu- 
tants. According to this view two species differ in that certain genes 
of one are replaced by allelomorphs in the other and the elementary 
evolutionary process is looked upon as the replacement of one gene by a 
mutation. It corresponds better with observation to assume that a 
species is made up of individuals no two of which are alike. Muta- 
tions have been occurring for millions of years, and each series of alle- 
lomorphs is typically represented in the species by more than one gene. 
What characterizes a species is a certain ratio in each series of allelo- 
morphs. The symmetry of the Mendelian mechanism insures the 





204 American Statistical Association 


constancy of such ratios in large populations, unless disturbed by evo- 
lutionary pressure. The elementary evolutionary process, according 
to this view, is merely change of gene frequency. It is conceivable 
that two species may have all of their genes in common. A difference 
in the frequencies of a large number of genes could well bring about 
such a differentiation that the probability would be indefinitely small 
that any individuals of one group could be mistaken for the other. 

The effects of various evolutionary pressures on gene frequency are 
easily deduced. In the case of a gene which is mutating with meas- 
urable frequency, the rate of change is, of course, directly proportional 
to the frequency, giving a straight line on a graph. Selection can 
have no effect on the frequency of a gene which is completely lost or 
fixed. The rate of change rises to a maximum at some point between, 
depending on dominance. In the case of unfavorable recurrent muta- 
tions, the mutation pressure, tending to shift gene frequency in one di- 
rection, is opposed to selection, tending to shift it in the other. Ata 
certain point, the two lines intersect. This is a point of equilibrium. 
Whatever the initial constitution of the population, there will be change 
until this equilibrium is reached. After this, evolution ceases as long 
as the conditions lead to the same mutation pressure and the same 
selection. The effects of migration can be treated similarly. Given 
constant conditions, evolution should occur only until every series of 
allelomorphice genes has reached the equilibrium determined by the 
various evolutionary pressures. Change in conditions should be fol- 
lowed by systematic changes in all gene frequencies until all have 
reached the appropriate new positions of equilibrium. Return to the 
old conditions should be followed by return to the old equilibria. We 
have here reached a theory of specific stability, amid infinite individual 
variability, rather than a theory of evolution. And even the changes 
brought about by changes in conditions—more severe selection, for 
example—being reversible, are scarcely to be dignified by the term 
evolution. 

Our analysis so far, however, has certain limitations. We have 
really treated only the relations between a gene and a single allelo- 
morph. If, as is probable, each gene is capable of mutation through 
an indefinitely extended series of allelomorphic conditions, new ones 
may appear sufficiently favorable from the first to upset the equilib- 
rium, to make possible new combinations, to alter all of the selection 
coefficients and thus bring about a continuous, essentially irreversible 
process even under constant conditions. This seems to be Fisher’s 
scheme. The difficulty is the extreme rarity of new mutations favor- 
able from the start. 








ws 


\e 


| ee 











Proceedings 205 


It has seemed to me that another factor should be much more im- 
portant in keeping the system of gene frequencies from settling into 
equilibrium. This is the effect of random sampling in a breeding pop- 
ulation of limited size. The gene frequencies of one generation may 
be expected to differ a little from those of the preceding merely by 
chance. In the course of generations this may bring about important 
changes, although the farther the drift from the theoretical equilib- 
rium, the greater will be the pressure toward return. The resultant of 
these tendencies is a certain frequency distribution, or probability 
curve, for gene frequencies in place of a single equilibrium value. 

The most general solution which I have reached for the formula of 
this probability curve is as follows: 


— (1,4N sq 4N(mgmt+v)—174 _ ,\4N[m(1—gm)+uJ]—1 
y=Ce'"g (1—q) . 


It includes terms representing effective size of population (NV), muta- 
tion rate from (u) and to (v) the gene in question, amount of exchange 
with other populations than that in question (m) and selection coeffi- 
cient (s such that the gene and its array of allelomorphs reproduce at 
the rate 1:l—-s respectively). Gene frequency is represented by q 
(abscissa in charts) and that of the species as a whole by gm. The 
form of the curve and the consequent statistical situation in the popu- 
lation vary greatly, depending on the relative magnitudes of the 
coefficients. 

The bearing on evolution can perhaps best be brought out by com- 
paring certain extreme and intermediate cases. Chart I represents 
the situation in a small population. The probability array for genes is 
approximately of the form, y=Cq-'(1—q)~. Most genes drift at ran- 
dom into complete fixation or loss, bringing the well known effects of 
close inbreeding. The population is extremely uniform. Only very 
rarely is an old gene replaced by a new one. Even severe selection has 
little effect, and the fixation process, being random, is in general in- 
jurious. The end can only be extinction for a group, permanently 
reduced below a certain size of population (in relation to other evo- 
lutionary factors). 

Thus a small population is not favorable to evolution. Consider the 
opposite extreme, a very large population (see Chart III). Here 
random variation of gene frequencies is a negligible factor and we have 
the situation considered first. Each gene is held in equilibrium at a 
certain frequency determined by selection, mutation, ete. Although 
the variability of the population may be great, the genes of different 
series combining in different ways probably in every individual, the 
average condition remains the same as long as conditions are constant, 








American Statistical Association 


Cuart I Cuarti 


























Cuartly 


4Nm2i6 


Un? 4 2 CaP 
4Ns2-10  4Ns:00 74° 
i A 









































0 “15 1.0 0 OS 


subject to the possibility, already discussed, that wholly novel favor- 
able mutations may disturb the situation. A change in conditions, 
such as more severe selection, may rapidly change the average of the 
population, but the change is at the expense of the store of variability of 
the species and compromises evolutionary advance for a long time fol- 





Proceedings 207 


lowing, since there is no escape from fixation except through the slow 
process of mutation. As previously noted, such change is of an essen- 
tially reversible sort and thus not really of an evolutionary character. 

Thus it seems that neither a small nor a large freely interbreeding 
population offers an adequate basis for a continuing evolutionary 
process. How is it with a population of intermediate size, defined as 
one in which the reciprocal of population size, the selection coefficient 
and mutation rate are all of about the same order. As shown in 
Chart II, gene frequencies drift at random about their equilibrium 
values, but not to the point of fixation as in a smaller population. 
Just because the direction of drift is accidental, the result is a kaleido- 
scopic shifting of the average characters of the population through 
predominant types which practically are never repeated. The selec- 
tion coefficients cannot be expected to be constant under such condi- 
tions. It is the organism as a whole that is selected, not the individual 
genes, and a gene favored in one combination may be unfavorable in 
another. Thus the probability arrays themselves will be constantly 
changing—some moving to the right and closing up, others to the left 
and opening out, some to the extreme left and loss. A continuous and 
irreversible, though primarily non-adaptive, evolutionary process will 
take place even under constant conditions. The rate, however, is 
limited by mutation rate and hundreds of thousands of generations are 
required for important evolutionary changes. Nevertheless this case 
seems the most rapid to be considered so far. 

We have not dealt wholly fairly with the case of large species. As 
size of population increases, the tendency to spread out and break up 
into partially isolated groups increases. Each sub-group has a system 
of frequency arrays for the genes in which they drift about at random 
and at rates determined, not by the size of the whole species and mu- 
tation pressure, but by the size of the sub-group and migration pres- 
sure. (See Chart IV.) The rate of decrease in heterozygosis due to 
size of population is 1/2N, where N is the effective size of the breeding 
population. The result is a geologically rapid drifting apart of the va- 
rious sub-groups, even under uniform conditions. Thisisanon-adaptive 
radiation, but, on the average, not such as to lead to appreciable deterio- 
ration. Exceptionally favorable combinations of genes may come to 
predominate in some of the sub-groups. These may be expected to ex- 
pand their ranges while others dwindle. This process of intergroup 
selection may be very rapid as compared with mass selection of indi- 
viduals, among whom favorable combinations are broken up by the 
reduction-fertilization mechanism in the next generation after forma- 
tion. With partial isolation and differentiation accompanying expan- 











American Statistical Association 





208 


sion of the successful sub-groups, the process may go on indefinitely, 
In short this seems from statistical considerations to be the only mech- 
anism which offers an adequate basis for a continuous and progressive 
evolutionary process. It may be added that when tested by observa- 
tion, it accords excellently with the actual situation found among natural 
species. It agrees well with the views reached by many field biologists. 

The final conclusion to which this analysis leads seems to be as fol- 
lows: The conditions favorable to progressive evolution as a process of 
cumulative change are neither extreme mutation, extreme selection, 
extreme hybridization nor any other extreme, but rather a certain bal- 
ance between conditions which make for genetic homogeneity and ge- 
netic heterogeneity. Such a situation means on the one hand the re- 
tention of a great store of variability in the population and on the other 
hand a random drifting of the mean grade of all characters, leading, 
occasionally by chance, to the attainment of exceptionally favorable 
gene combinations. In particular, a state of sub-division of a sexually 
reproducing population into small, incompletely isolated groups pro- 
vides the most favorable condition, not merely for branching of the 
species, but also for its evolution as a single group. 











ly. 
*h- 
ive 


)]- 








Proceedings 


THE LUMINOSITY OF THE STARS 


By Jan Scuitt 


If things are haphazard there is no problem for science. It is only 
the discovery of some order that starts scientific thinking. The first 
thing that strikes the observer in a clear night sky is the multitude of 
stars, but almost at once he will realize that some of them are con- 
spicuously bright and that certain groups exhibit some regularity in 
their positions. With this observation the science of the stars was 
started; astronomers gave names to outstanding objects, which we find 
in the oldest records in every culture. For thousands of years the law 
of the stars—that they are invariable, fixed and eternal—was chal- 
lenged again and again by astronomers who carefully checked up on 
the star positions but failed to find enough evidence to discredit this 
law. In the beginning of the eighteenth century Halley pointed out 
that several stars had moved since the time of the Almagest, sixteen 
centuries before. From then on the stars began to be of scientific 
interest in a rapidly increasing degree, due to the availability of more 
and more observables. 

Science cannot proceed beyond a certain stage, set by the accuracy 
of the observations, and the number of observable things. The first 
task is always for the observer. He is satisfied when his measure- 
ments, repeated under the same circumstances, give the same results. 
Next, the statistician codrdinates the observed data with the purpose 
of finding order. Suppose, for instance, that we have as observed ma- 
terial the length of human life. The curve of the frequency distribu- 
tion for this phenomenon is rather wide; in other words, the dispersion 
is large. Several laws of order have been recognized which permit us 
to replace the original frequency curve by a number of component 
curves relating to such characteristics as sex, occupation, etc. By 
refined coérdinating with respect to more and more observables it is pos- 
sible to obtain smaller and smaller dispersion, i.e., more accurate pre- 
diction, until the statistician has exhausted all his observables. He 
will then say: “‘ My prediction is such but in half of the cases I will be 
more than so much in error, which depends on chance.”” By this he 

means that the residuals of the observations after he has taken out 
every trace of order are haphazard. It is left to the theoretical scien- 
tist to interpret by a formula the order extracted by the statistician 
from the observer’s results. 
The light of a star is measured in units of a logarithmic scale which 

















American Statistical Association 





210 


are called magnitudes, one magnitude corresponding to a light ratio 
of 2.51... tol. The magnitude of a star depends on its radiating 
power and on its distance. If there is no light absorption between 
the stars and us, the light will be as the inverse square of the distance, 
and as soon as we know the distance we can by a simple formula reduce 
the magnitude to what is called the absolute magnitude. This is the 
magnitude that a star would appear to have if it were placed at a 
standard distance of ten parsecs. The distance is mostly defined by 
the small angle included by the directions to the object from two 
points on a base line. As standard base we use the distance of the earth 
from the sun, and the small angle is called the parallax. About the 
measurements of stellar parallaxes I shall say only that they have to be 
relative to faint stars in the background. The determination of a 
parallax thus presupposes the knowledge of the statistical mean paral- 
lax of the background stars, and this quantity is especially important 
if the measured parallaxes are small. Statistical parallazes are derived 
from proper motions and radial velocities. The proper motions of the 
stars in some constellations appear to be haphazard, in others there is 
a certain parallelism, but if we take the mean proper motions in areas 
all over the sky we see at once a distinct order, which, as first pointed 
out by Herschel, shows that the stars move away from a point in Her- 
cules toward a point in the Hare—the equivalent of saying that the sun 
moves toward the point in Hercules. The mean angular displacements 
are complemented by the mean velocities measured in the Doppler 
effect in the spectra. The quotient of the mean unforeshortened values 
gives the mean parallax, which for stars higher than magnitude 6 
amounts to ’’.013. 

The way is now cleared for the derivation of the frequency of stars 
of a given absolute magnitude. How many among the stars are 
brighter than the sun, how many are 100 times brighter? The fre- 
quency curve which represents this distribution is called the luminosity 
curve, and its determination runs asfollows. Supposing, first, that we 
know the individual parallaxes, we may arrange the stars in groups be- 
tween given limits of parallax. If these limits are not too wide the 
stars in one group can be considered as all having the same parallax or 
the same distance. The reduction from apparent magnitude to abso- 
lute magnitude is then a constant for each group and since the volume 
of space occupied by the group is given, we can compute the numbers 
of stars of each absolute magnitude per unit of space. In dealing with 
distant stars we must make use of the statistical parallaxes. If we be- 
gin by grouping the stars according to magnitude and proper motion, 
we obtain for the different groups widely different statistical parallaxes, 























Proceedings 211 


and the dispersion in distance inside each group becomes comparatively 
small. This is because of the fact that proper motion and magnitude 
are indicators of distance. Whereas a particularly large proper motion 
may be due to an exorbitant velocity, the average proper motion will 
decrease as the distance becomes larger. In the same way a bright 
star is often absolutely bright but the mean magnitude will increase 
with the distance. Kapteyn was the first to compute the mean paral- 
laxes for a number of groups so selected. Each star is subsequently 
treated as if its actual parallax were equal to the mean of its group, and 
the further procedure is now equivalent to the one described for the 
cases in which the individual parallaxes are known. The luminosity 
curve shows that in 10° cubic parsecs there are about 3,000 stars of 
the same absolute magnitude (+5) as the sun; there are fifty of abso- 
lute magnitude 0, which is 100 times the sun’s light; and there are two 
or three stars in every 10° cubic parsecs, having as much as 10‘ times 
the sun’s power. The fainter stars are more numerous than those of 
the sun’s type. 

The luminosity curve beyond +5 shows a slow rise and seems to 
become stationary. In fact, Kapteyn thought that for the still fainter 
stars the curve went down again, but the last points are rather uncer- 
tain. They represent stars of only .01 to .0001 of the light of the sun. 
However abundant such stars are in 10° cubic parsecs, we can observe 
only a few of them, just because they are so faint. There is, however, 
one circumstance that helps to find out the very faint stars. On ac- 
count of their small distances they usually have large proper motions. 
If a very faint star has a large proper motion the parallax is measured. 
Systematic searches for such stars have actually revealed twenty-eight 
stars fainter than absolute magnitude +8 within a radius of five par- 
secs from the sun, several of which are companions of brighter stars. 
Within the same distance there are only four stars brighter than the 
sun. If the immediate neighborhood of the sun may be regarded as a 
fair sample, we must conclude with Seares that the luminosity curve 
goes up for the absolutely faint stars. Whether it will go down even- 
tually for the still fainter stars, is not yet shown by observations. 
Consequently, the total number of stars per unit of volume cannot yet 
be estimated even vaguely. All we can obtain is an estimate of the 
number of stars down to a limit of absolute magnitude. Seares finds 
that there are forty-two stars per 1,000 cubic parsecs if we put the limit 
of absolute magnitude at +10, and about 100 stars if we take +15 as 
the limit of magnitude. 

So far we have used the stars irrespective of the spectral type. From 
the spectrum we learn two important constants pertaining to condi- 








212 American Statistical Association 


tions in the stellar surface: (1) the temperature, from the distribution 
of energy in the continuous spectrum; (2) the degree of ionization, from 
the relative intensities of the spectral lines. The ionization depends 
chiefly on the temperature but also on the density of the gas. The 
spectral classes are designated by the familiar symbols B, A, F, G, K, 
M, and some others which are rare. In this order they represent a 
decreasing degree of ionization, and since the ionization depends largely 
on the temperature, the fact is that the spectral sequence arranges the 
stars according to temperatures. The B and A stars are hot (10,000° 
to 20,000°) and of white color; the M stars are comparatively cool 
(3,000°) and of reddish color. It is obviously desirable to derive sepa- 
rate luminosity curves for the stars belonging to different spectral types. 

For the B stars, which have high temperature, the luminosity curve 
really does show a maximum and the frequency goes down at absolute 
magnitude +2 and fainter. For the cooler K stars, on the other 
hand, the curve reaches first a maximum; after this it goes down and 
then up again. The intermediate types show intermediate curves. 
If now we go back to the original curve for all stars together, we see 
that the portion covering the bright stars is made up of hot stars as well 
as cool stars, but that the portion covering the faint stars is entirely due 
to the cool stars. This result, derived entirely statistically, fits in 
with the important discovery by Hertzsprung that there are among 
the cooler stars giants and dwarfs. The statistical luminosity curve 
shows convincingly that there is a lack of intermediate stars. If we 
omit the giants, the rest of the stars show a striking almost one-to-one 
relation between color and absolute magnitude. In order to eliminate 
uncertainties in the distances and consequently in the absolute mag- 
nitudes it stands to reason that we ought to examine the faint stars 
which are members of a cluster. Such stars can be considered as being 
at the same distance. 

Hertzsprung’s diagram for the Pleiades presents a case where there 
are no giants. All the stars lie in one neat band. More or less similar 
diagrams exist for a number of clusters. In the Hyades, there are a 
few giants which occupy an isolated place, but the bulk of the stars is 
always in the main sequence. By using the parallaxes of the nearest 
stars the diagram obtained is especially rich in absolutely faint stars. 
In this case there appears a new group which I have not mentioned so 
far. Itis situated in the portion of the diagrams which represents high 
temperature and low absolute magnitude. They are called white 
dwarf and are extremely faint. For this reason they do not show up 
in sufficient numbers (only four or five are known) to manifest them- 
selves in the statistical luminosity curve. 


















Proceedings 213 


If we want to take in the giants we have to resort to distant stars, 
and directly measured parallaxes are of no avail. Several efforts have 
been made to modify Kapteyn’s method of mean parallaxes in the hope 
of finding more detail in the luminosity curve. A recent and rather 
successful attempt by Strémberg should be mentioned. 

Instead of just the mean values of the observed velocities and proper 
motions the distribution of the frequencies may be utilized. The fre- 
quency distribution of the velocities is known. If all stars were at the 
same distance, the frequency distribution of the proper motions would 
be equivalent to the distribution of the velocities. As it is, the distri- 
bution of the proper motions is different because of the spread in dis- 
tance. If the spread in distance is known, we can easily derive the 
velocity distribution from the proper motions. The reverse process 
of deriving the spread in distance from the observed proper motions is 
mathematically not easy, but the problem is definite, and that is suffi- 
cient for a computation. The frequency distribution of the proper 
motions is transformed by means of some luminosity curve into a ve- 
locity distribution. This velocity distribution is in general different 
from the observed one, and the luminosity curve first adopted is then 
changed until, by trial and error, the observed distribution is well 
represented. In this way Strémberg has found that the luminosity 
curve for the giants does not show one broad maximum, but two or 
even three separate maxima. In the curve of the M stars there are 
two maxima of nearly equal abundance: giants and supergiants. For 
the K stars the bulk of the giants are in one maximum, but two sec- 
ondary maxima stand out, indicating the presence of supergiants and 
of what Strémberg calls subgiants. The latter group is expected to be 
more conspicuous in the G stars, but these results have not yet been 
published. 

It appears that the stars favor certain values of absolute magnitude. 
These favored values depend on the spectral types and on the colors, 
or in general on the prevailing conditions in the stellar surfaces. The 
remarkable feature is that the size of the stars apparently does not enter 
as an independent variable. There are at the present time a number 
of theories to interpret the statistical laws of stellar luminosity. The 
multitude of theories, however, is an indication that the task of the 
statistician is not yet finished. If asked in which direction further 
development will be, I would say that some new progress is to be ex- 
pected from a detailed analysis of the velocities of the stars. 








American Statistical Association 


APPLICATIONS OF STATISTICAL METHOD IN 
ENGINEERING 


By Watrter A. SHEWHART 


I have been invited to indicate briefly what an engineer can do with 
the aid of statistical method that he cannot do without it. 

There are many ways of doing this. I might discuss some of the de- 
tailed problems of a civil engineer, such as the determination of the 
maximum run-off of a flood area; the work of Westman of the Ontario 
Research Foundation in the application of statistical theory in ceramic 
engineering; the work of Hayes and Passano of the American Rolling 
Mills Company in the study of corrosion; the investigations of Wilson 
and others of the Forest Products Laboratory; the researches of 
Becker, Plaut, Runge and Daeves in the production of steel in Ger- 
many; or the work of any one of a number of others, including many of 
my colleagues in the Bell System who are applying statistical theory in 
the solution of their engineering problems. Applications have been 
made in every field of engineering. 

However, to consider only the problems to which statistical method 
has been applied, would be to leave out of the picture some of the most 
important applications of the theory in engineering work. This is 
particularly true since industry is only now beginning to appreciate 
many of the important applications that can be made in connection 
with the control of quality of manufactured product. I have discussed 
elsewhere! in detail the nature of the applications of statistical method 
to research, development, design, production, inspection and supply so 
that any one interested in such details can easily avail himself of this 
information. Today I shall limit myself to a consideration of two 
fundamental concepts which characterize the present era of scientific 
development and necessitate the revision of previously accepted meth- 
ods of presenting, interpreting and using all kinds of engineering data. 

For a long time engineers based their developments upon two funda- 
mental assumptions: (1) A physical quantity is a true fixed value, 
whereas measurements of this quantity differ from the true value be- 
cause of ever present errors of measurement; (2) A physical law is a 
functional relationship of the mathematical type, whereas observed re- 
lationships are always influenced by ever present errors of measure- 
ment. 


iw. A. Shewhart, “Economic Aspects of Engineering Applications of Statistical Methods,” Journal 
of the Franklin Institute, Vol. 205, March, 1928, pp. 395-405. 


















Proceedings 215 


In other words, even though engineers have heard much about sta- 
tistical methods and their application in education, sociology, econom- 
ics, etc., they have been inclined to stand aloof and say: “‘ Well, these 
methods may be all right for the fellow who deals with such an inexact 
science as education or economics, but, thank goodness, we do not have 
to depend upon their use because we are dealing with the application of 
exact sciences, such as physics and chemistry.” 

Of course, they have been willing to admit that the theory of errors— 
a special form of the statistical method—has enabled them as engineers 
to do many things that they could not have done otherwise. For ex- 
ample, such a theory gives a rational basis for estimating the error of 
estimate of the assumed true but unknown quantity. With the de- 
velopment of this theory, it became possible to determine how many 
measurements should be made in order to reduce the error of an esti- 
mate of the assumed true value to any preassigned magnitude. 

On the basis of these older assumptions, the theory had even more 
important applications than those just mentioned. Most physical and 
engineering measurements are indirect. The magnitude of the quan- 
tity to be measured is expressed in terms of measurements of some m 
other quantities to which it is assumed to be functionally related. In 
all such cases the theory of errors makes it possible to determine the ef- 
fect of errors of measurement in any one of the m quantities upon the 
resultant error and to choose the best of several possible methods of 
measurement. This contribution has gone a long way toward increas- 
ing the efficiency of industrial research. 

Scientists and engineers alike have made use of error theory in 
helping them to find or fix upon the objective functional relationship 
assumed to exist between two or more quantities or, in other words, to 
find or fix upon the objective laws of nature. Perhaps the most im- 
portant criterion developed for this purpose is the method of least 
squares. 

Thus we see that statistical method has played an important rdle in 
engineering from the very beginning of the period of industrial expan- 
sion based upon the applications of scientific principles. We are, how- 
ever, living in the dawn of a new era; an era in which there is every 
reason to believe that the applications of statistical method in the past 
will be overshadowed by far more important applications in the future. 

We hear distinct rumblings of a revolution in the camp of the “‘ex- 
act” sciences. The concept of exact is overthrown for the moment at 
least, and in its place statistical concepts hold sway. 

But what of it? We as engineers have grown accustomed within 
recent years to cataclysmic upheavals in physical theory. Few of these 











216 American Statistical Association 


have reached the height of general interest attained by the theo: y of 
relativity; and yet, after everything has been said and done, how many 
of us today have changed our engineering practices because of rela- 
tivity? True it is, relativity has its place, but as yet it is not a useful 
tool for most of us. May it not be, therefore, that all of this new inter- 
est in statistical theory is but « tempest in a teapot so far as it tovches 
us who are concerned primerily with those things which necessitate 
significant changes in our utiliiarian mode of thinking? 

Let us therefore consider two of the changes which have come into our 
fundamental scientific concepts which indicate that the applications of 
statistical method in engineering .ave come to stay—I refer to the 
statistical nature of physical prope “ties and physical laws. Engineer- 
ing development is based upon the «1se of such properties and laws. 
Hence, the substitution of statistical for ‘‘exact”’ concepts in this field 
is of vital engineering significance, as we shall now see. 

A. Statistical Nature of Physical Properties—What engineer is not 
interested in the physical properties of materials? What engineer does 
not have about him a table of the so-called physical and chemical 
constants? Yet these so-called constants are really not constant. This 
idea of constancy holds over from the older concept of a physical 
quantity as a true, fixed, objective value. 

Let us see why this new concept is of importance. In the first place 
it makes it necessary for us to revise our picture of a property of a 
material. To make our discussion specific, let us fix our attention on a 
metal bar. Whatisthe density ofthe metalinthat bar? If we take this 
question to mean point to point density, the answer is indeterminate. 
Density has meaning only as a statistical average of indeterminate 
point to point densities. In fact, the density of material of this kind is 
not an objective constant but rather an objective distribution function. 

Suppose we were to take 1,000 similar bars and were to break them to 
determine their tensile strength. We would find that the observed 
values differed not alone because of error of measurement, but primarily 
because the tensile strength of bars, as nearly alike as we know how to 
make them, is a distribution function and not a fixed true value. We 
cannot say, therefore, that the tensile strength—or any other physical 
property of a material—is a certain fixed value. Instead all that we 
can ever hope to say is that a certain proportion of a given kind of 
material, essentially the same so far as we can determine, will have a 
tensile strength—or other physical property—lying within a specified 
range. The difference between the old and the new concept is shown 
schematically in Chart I. 

Since objective physical properties are distribution functions, stand- 





art 
co 
srr 
pr 
ac 














Proceedings 217 


CHART I 





ee oe = S >< - 
@ TRUE MAGNITUDE OF QUALITY <=» =, TRUE DISTRIBUTION OF QUALITY 
© GBSERVED MAGNITUDE © OBSERVED QUALITY 


I 
ards for such properties should be: distribution functions. Now, of 
course, if the coefficients of variation for most physical properties were 
small, the engineering importance of the statistical nature of such 
properties would not be so great as it is under the conditions that 
actually exist, as we shall show by a simple example. 

An important property of material is that indicating strength. 
Thus for wood an engineer is interested in its modulus of rupture. If 
he turns to an engineering handbook or almost any other source, he 
usually finds a single figure recorded for each species of wood. One 
such table of data taken from a standard book is shown below. Is the 
engineer to draw the conclusion from such a table that every specimen 
of long-leaf pine, for example, has a greater modulus of rupture than a 
specimen of any other species cited in this table? So far as the table is 
concerned, he might justly draw such a conclusion. 


TABLE I 


Modulus of 
rupture 
Pe re ere ee ee 7110 
6. ob set ad hae ed dee eeukh awa amen 8280 
dpc ietarbhenimoaneee eae ee 6685 
PT <.isrceteeisdbedmagtiieesaemenetenne 7870 
oe raga ee wind od awl uieaceres naan tun 8380 
846 stad bbc duNseed eaeueeeniNeseneaee 5173 
Di «¢ :¢tve0cenbecteee ates easewesnebesenen 5900 
DT sud pcounasksndnte set aesewnsesetoewsnenes 6980 


Chart II shows why, in such a case, the average does not tell the 
whole story. This chart gives approximate standard distribution 
functions for the modulus of rupture of round timbers from four species. 
It is apparent that the variability is large compared with the mean 
modulus of rupture. In fact some pieces from each species will have 








American Statistical Association 


CHART II 


SPECIES A 


ee 























ro) ' 2 3. «|4 5 6 ? al 9 10 TT 12 | 13 
l, x te 
SPECIES B 
a | 1 i i i 1 i i i j 
i) 've 4 6 ? e le io WW i213 
L, x te 
SPECIES ¢ 
i i L i 1 i j 
r) if 2 3 4 5 6| 7 8 9 10 Ot 12 13 
t, x Lo 
SPECIES DB 
t i i i 1 i L i i 1 i j 
0 t 2 3} 4 5 6 7 - 9 ‘wl ou i243 
Ll, x Le 


the same modulus of rupture. Hence, what the engineer needs is a 
working approximation to this standard distribution function that will 
tell him approximately the proportion of the pieces of material of a 
given species that may be expected to have a modulus of rupture 
within any two fixed limits. Certainly a single value of the modulus 
of rupture for any one of the four species shown in Chart II falls far 
short of giving the desired information. 

What is true in the case of the modulus of rupture of wood is true of 
many of the most important physical properties of materials—the 
statistical nature of physical properties is of engineering significance. 






















Proceedings 219 


Speaking of the modulus of rupture of telephone poles naturally 
brings up the sampling problem involved in giving assurance that the 
standard of quality of a material is being maintained, particularly in 
those cases where, as in the determination of the modulus of rupture of 
telephone poles, the test is destructive. 

To be specific, how would you choose a sample of 100 poles from a 
pole yard for the purpose of giving us the best information as to the 
quality of the poles in this yard in respect to modulus of rupture? To 
answer this kind of question, the engineer finds it necessary not only to 
make use of available sampling theory but to extend this theory. 
Without such theory, he has no rational basis of assurance that the 
quality of product is being maintained. 

When we stop to think that some of the most important qualities of 
everything that we use must be maintained upon a sampling basis 
because of the destructive nature of tests, we get a glimpse of the im- 
portance of sampling theory of this kind. Three specific illustrations 
are the fuse that protects your home, the steering rod of your car and 
everything that you eat. 

B. Statistical Nature of Physical Laws.—No longer do we believe 
that relationships between physical quantities are functional in the 
mathematical sense. Instead, we think of them as being statistical. 
The contrast between the old and the new concept is indicated sche- 
matically in Chart III. 

As a specific illustration, let us consider the relationship between 
tensile strength and hardness for some metal such as steel. The older 
concept assumed the existence of a functional relationship between 
hardness and tensile strength, represented by a curve showing a one- 
to-one correspondence between these two properties, as schematically 
illustrated in the left half of Chart III. Observed deviations from 











CHART III 
OLD NEW 
e e e 
e e 
ee - 
ee se 4 
e ee ete? 
Y Y . © Sees ° 
e ogeer o0%e ° 
oo“? e 
ose? o 
o.20° 
e%° 
x x 





mmm EXACT FUNCTIONAL RELATIONSHIP @ STATISTICAL RELATIONSHIP 














220 American Statistical Association 


this hypothetical curve were attributed to errors of measurement. 
In fact, many calibration curves of tensile strength in terms of hardness 
are based upon this older concept. 

Today, however, we look at this situation in a different light. No 
longer do we believe that there is a one-to-one correspondence between 
such properties. Instead, we believe that there is only a statistical 
relationship of pairs of values of two such quality characteristics cor- 
responding to all possible samples of what we assume to be essentially 
the same material. This situation is represented schematically in the 
right half of Chart III. No longer then, in such a case, are we free to 
treat the deviations of an observed set of points from any curve of 
best fit as errors of measurement. 

Suppose that you have, let us say, 1,000 metal bars, all of which have 
been made under the same essential conditions so far as it is possible to 
do this. Suppose that these bars are to be used in a piece of apparatus 
where it is desirable to insure thai we tensile strength lies between two 
preassigned limits Y; and Y2. How would you divide the bars into 
two piles—one that meets the limits and one that does not? 

Obviously you cannot test these bars by breaking them, for then you 
would have no bars left. In fact you must resort to the use of some 
statistically correlated variable, such as hardnoss, as a measure of ten- 
sile strength. Immediately, however, you are confronted with the 
fact that the inherent indeterminateness of such a statistical measure 
makes it impossible to set up any two limits X; and X2 on hardness 
such that one can be assured that the pieces of the material having 
values of hardness within these two limits will have tensile strengths 
within the specified range Y; to Y2. The situation is shown schemati- 
cally in Chart IV. 

Even when the two variables, tensile strength and hardness, are 
truly correlated, the best that one can hope to do is to say that the prob- 
ability that a piece of material having a hardness between X, and X: 
will have a tensile strength between Y; and Y2 is some fixed value p’. 

The establishment of inspection methods to assure the quality of 
product in such an instance not only necessitates the application of 
statistical theory now available in the literature but even of an exten- 
sion of this theory. 

We have seen that, whereas the philosophy of industrial develop- 
ment of the old era attributed importance to the statistical method only 
in the handling of errors of measurement, the philosophy of industrial 
development today takes as two of its fundamental postulates the sta- 
tistical concept of physical property and the statistical concept of physi- 
cal law. Furthermore, industry today is becoming more and more 








TENSILE STRENGTH 


ae —  — 








Proceedings 










it. CHART IV 


TENSILE STRENGTH 











HARDNESS 


aware of the fact that it must rely upon statistical method to furnish 
a rational basis for establishing economic standards of quality of raw 
materials and finished products; to assist in obtaining minimum overall 
variability in finished products; to provide ways and means of reducing 
to a minimum the cost of inspection and the cost of rejection; and to 
give to the consumer maximum assurance that the quality of pro- 
duct is being maintained. 











American Statistical Association 


THE NORMAL HYPOTHESIS 


By Burton H. Camp 


It is very commonly assumed, in the theory underlying statistical 
practice, that certain types of distributions are normal. It is my pur- 
pose to review some of the methods where this assumption is made, to 
consider the reasons, if any, which may justify the assumption, and the 
conclusions we may properly extract from them. Manifestly, there is 
one important part of statistics which is constantly involved in this 
hypothesis, namely, sampling. I shall not, however, consider sam- 
pling at all. 

1. The Measurement of Ordered Series. My first example has to do 
with ordered series. We mean by this such a distribution as one gets 
if he attempts to measure things for which we have as yet no satisfac- 
tory unit of measurement, such as: healthfulness, intelligence, integrity, 
religious fervor, the beauty of a poem, the dullness of a lecture, the 
grade of an investment, the optimism of a broker. For the measure- 
ment of these things we have no yard stick. We can, however, classify 
them. We can arrange them in order in a series. 

Such an order may consist of scores made on a test. Peter scores 
150; James, 100; and John, 50. Peter, James and John are, we believe, 
truly in that order as regards their aptitudes for that particular subject, 
but we are not willing to believe that the difference in aptitude between 
Peter and James is the same as the difference between James and John, 
just because the scores said so. Educationalists have long recognized 
this difficulty. 

Suppose we have such a set of scores, and that we do wish to sub- 
stitute for them a set of numbers which will be truly a set of measure- 
ments. The common method is the following one: Assume that the 
true frequency distribution is a normal curve. If now it happens that 
our set of given scores is a normal set, then these scores are the true 
ones sought; if not, there is a simple method by which they may be 
transmuted into other scores that will be normal. 

Can we justify this proceeding? In the case of intelligence there 
are at least two arguments for it. (a) The first has been the subject 
of lengthy investigations by the psychologists, notably by E. L. 
Thorndike, and has to do with what may be called least discernible 
differences. The questions of a test are made to differ, the one from the 
next, by as little as possible, as judged by a number of psychologists; 
then it does turn out that the raw scores obtained by the use of these 
tests are actually crudely normal. 









_ 
~~ 


orm oO. Dm & 


Pe ee ee ee ee ee a) ee en ee ee ee a a oa? Sie ae!) 











Proceedings 223 


(b) The second justification is the argument from analogy. There 
are also certain other possible considerations that have some force, one 
involving so-called elementary causes, so useful in target shooting, but 
I think most of us will agree that finally all we are really willing to 
believe, even as a working hypothesis, is that these distributions are 
crudely normal—better than a saddle, not so good as a bell, not so bad 
as a rectangle, some sort of unimodal distribution, in general with not 
an extreme amount of skewness, or of kurtosis. Our transmuted 
scores, then, are not really the exact things they appear to be. Let us 
remember this as we proceed to make use of these measurements; let 
us trust them somewhat but not too far. ‘One reason why it is so 
desirable to substitute numbers in place of the various classes of an 
ordered series is that numbers are needed when we want to correlate 
one series with another. 

2. The Correlation of Ordered Series. Let us now consider the corre- 
lation of two ordered series and also the correlation of an ordered with 
a measured series. A table has been constructed! showing, for forty- 
six states of the United States, the relation between the laxity of 
divorce laws and the divorce rate. This table is not supposed to 
prove anything. It has been hastily thrown together, and is only 
to illustrate a method. A means that the divorce laws are very 
strict. New York and North Carolina are included in this group. 
F means that they are very lax, as in the Dakotas. The states of 
South Carolina and Nevada were omitted, because laws there are so 
peculiar. In the first you cannot get a divorce, no matter what you 
do, and in the second you can hardly fail to. We have here, then, an 
ordered series based on laxity of laws, and a measured series based on 
the counted numbers of divorces. If we wish to correlate the two we 
must replace the classification, A to F, by numbers; and the well-known 
method is to replace them by such numbers as will indicate the mean 
positions of the various classes on a normal scale. In so doing we 
tacitly assume that were it possible for some more legally minded 
statistician to measure accurately the relative severity of the laws in 
these several states he would obtain as a result a normal distribution. 
If instead of a measured distribution for the other variate we had there 
also only an ordered classification, we should proceed by assuming 
normality there also. In the case before us, using Sheppard’s correc- 
tion to eliminate grouping error, I found r equal to about 0.39. But 
since the fundamental hypothesis is only crudely satisfied at most, I 
do not know my r nearly to that degree of accuracy. All I am reason- 
ably sure of is that the correlation in these states is in the neighborhood 


1B. H. Camp, The Mathematical Part of Elementary Statistics, p. 171. 














224 American Statistical Association 


of 0.4, and I know this only if I am ready to assume that the dis- 
tribution is in the neighborhood of the ideal one. 

Percy Stocks examined the relation between the presence of goitre 
and cancer in 5,147 cases in London hospitals and found a four-fold 
table! from which he derived tetrachoric r=0.4587=.0286, using 
Pearson’s method. I think that tetrachoric r has not been welcomed 
on this side of the water as cordially as it ought to be; and the reasons, 
I take it, are two: one is its forbidding, foreign sounding name, shared 
also by its even more difficult neighbor, polychoric r; and the other is 
that, if one wishes to obtain the degree of accuracy desired by Pearson, 
it often requires a very lengthy computation. But if, as I shall show 
you directly, this high degree of accuracy is spurious, and if, by a com- 
bination of his methods and others, its value may be found to the 
approximation warranted in the premises both easily and quickly, the 
second objection is obviated. Tetrachoric r is that r which would 
have been found by the usual methods if our data had been given 
initially so that a fine division into many cells was possible—provided 
only that that finer division would have presented a normal surface. 
It is therefore an easily interpretable coefficient, and for that reason 
preferable to the substitutes which have been widely advertised? 

With this brief explanation in mind, let us now return to an ap- 
praisal of Stocks’s example. The probable error means very little, for 
more important than it, or even perhaps than the error due to non- 
randomness, is the error due to the fact that, since the distribution 
probably is not normal, the method of computation is itself only a crude 
approximation. I do not know how great this error is, but provision- 
ally, from experience with numerical examples, I believe it may easily 
amount to as much as 0.1. If, therefore, one is willing to put up with 
that degree of inaccuracy in the final result, it takes only five or ten 
minutes to compute tetrachoric r. There are three cases: (a) when 
the numerical value of r is less than 0.2; then Pearson’s own method is 
simple to apply; (b) when the numerical value of r is greater than 0.8; 
then the tables of Pearson and Lee can be used without more than a 
very approximate interpolation in them; finally (c) in the intermediate 
cases; I suppose others besides myself may have experimented with 
what amounts to a specialization of Pearson’s polychoric r. For 
Stocks’s problem this method gives r=0.42. 

8. Partial Correlation. A good picture of a multiple correlation solid 
is a box of honey in the honey comb. The various cells of the honey 





1 See Biometrika, vol. 16, p. 385. 
2 The usual interpretation of the substitutes consists, in fact, in telling how close they come to r when 
the distribution is normal. 














—vS te «6D 


= 








Proceedings 225 


comb are the cells of the correlation solid (let us think now of three 
dimensions only), and the amount of honey in any cell is the frequency. 
The box is, say, resting on the desk. The axis of x goes to the right, 
y goes forward, and z upwards. 

If we cut out a slab of honey, one cell wide, by two planes perpen- 
dicular to the z-axis, we have taken out essentially an ordinary two-way 
correlation table, extending in the z and y directions, upwards and 
forward. The ordinary correlation of z and y in this solid is what 
we mean by the partial correlation, z on y. Because, in general, this 
will be different for different slabs, this partial correlation depends on 
z also, and may be written r.y.2 or 7,y(x), or more simply r(z). It is 
a simple notion, akin to partial differentiation. 

I said that the partial correlation was different for different slabs, in 
general. If, however, the distribution of the drops of honey in this 
honey comb is what is called a normal solid, then these partial correla- 
tions are all alike and are given by the much used formula 


r oe Tyz—TVeyl zz 
zy-z— e 
V1 = ryV1 — r.. 


This formula may be proved subject to another group of assumptions 
which are not quite tantamount to the assumption of normality, but 
they are so nearly so that I shall treat them as if they were. Essen- 
tially, then, the above expression holds good for the true partial corre- 
lation, r(x), if, and only if, the honey is normally distributed. 

This is a severe assumption. One is rather unwilling to grant the 
arguments for the normal distribution of intelligence. That is a one- 
way distribution. It is more difficult to agree to a normal distribution 
in Stocks’s problem. That is a two-way distribution. The honey is 
a three-way distribution. 

But this is not the worst of it. Most workers with partial correla- 
tions are not content with three-way spaces: they need often half a 
dozen and sometimes more. The corresponding formula holds for n- 
way space. The mathematical analysis leading to it is elegant, but as 
we mount upward the restriction becomes more and more difficult to 
accept. Moreover, we are in a dilemma; it appears to be that or 
nothing, for one may not say to the research worker, “‘Throw away 
your formula and compute your partial correlation directly from your 
data.”” He simply does not have and cannot get the data needed for 
such a purpose. Consider that proposition for a moment. He needs 
to confine his attention to a two-way slab in a six-dimensional box of 
honey, and to compute therefrom an ordinary coefficient of correlation. 
Suppose his box measures 10 cells on a side, and that he has as many 























226 American Statistical Association 





as 5,000 data (practically, a very large number). Can you guess how 
many data he would have, on the average, in his slab? Only one-half 
of one datum. In the more densely populated regions of course there 
would be more, but obviously not enough, even were all the necessary 
information available, to permit a computation of a correlation 
coefficient. 

So it is the old story over again: we cannot get what we want exactly, 
so we have to make a few suppositions and hope for the best. The 
most convenient thing to suppose is this nice normal hypothesis, with 
the theory all worked out and the formulae easy to apply; but unfor- 
tunately in this case of many dimensions, even if we were willing to 
presuppose a crudely normal distribution, we should be very fearful of 
the results. Here it seems quite likely that a small deviation from the 
conditions of our hypothesis would mean a large deviation from the 
conditions expected in our conclusions. It is better to seek another 
line of defense. Some of you have perhaps already been reflecting that 
this formula may be derived without the use of the normal hypothesis. 
It cannot be shown to be the value of the partial correlation without 
that hypothesis (or essentially that), but it can be proved to be some- 
thing analogous to the partial correlation. Briefly and inaccurately, it 
may be said that the above expression may be thus proved to be a 
sort of average value of the various true partials I have described. In 
this sense it is better described as net correlation. If, therefore, one 
is careful to use it in this sense only, one is justified in using it inde- 
pendently of whether his distribution is normal or not. But if, on the 
other hand, as sometimes happens, one does desire the correlation for 
some special value of xz, of course we do not know that this so-called 
average value is ncar the special one desired. 

Statistics is often called the science of averages. I have tried to 
point out for this small group of cases that statistics is the science of 
averages in a more penetrating sense than the phrase commonly implies. 
Many of the formulae themselves are valid on the average only. The 
mere presence of limitations, however, does not mean that the formulae 
do not have real worth. They are, perhaps, comparable to many of the 
formulae of medical science, ineffective in a few cases, but invaluable 
in most. 

















Proceedings 


CLASSIFICATION OF SIZES OR MEASURES BY 
FREQUENCY FUNCTIONS 


By Epwarp L. Dopp 


Nearly everyone is familiar with frequency tables giving frequency 
distributions. For example: 


DIsTRIBUTION OF HEIGHT oF A Group or 380 MEN 
(Height in inches to the nearest half-inch) 
Height........ 66 66.5 67 67.5 68 68.5 69 69.5 70 
Frequency..... 30 40 46 49 50 49 46 40 30 





This means that 30 men were found to be 66 inches in height; 40 
men, 66.5 inches in height; ete. 

If all distributions were as simple as the one postulated above, 
mathematicians would have a comparatively easy time in their efforts 
to describe distributions, and statisticians in general would find it 
comparatively easy to understand the mathematicians. For everyone 
knows what a circle is; and the above distribution may be describedJas 
circular—more specifically as semi-circular. 


CHART I 
FREQUENCY 





















































MEN 





66 67 68 69 70 INCHES 


Using a diagram with rectangles of base one-quarter inch and alti- 
tudes drawn to the scale, one inch = 40 men, the distribution of altitudes 
or frequencies is such that a circle of radius 114 inches passes through 
the center of the upper bases of the rectangles. Drawn to a different 
scale, the curve would in general be elliptical instead of circular. 














228 American Statistical Association 





Now the difficulty is right here: very few curves have simple names 
that are generally understood. Indeed, mathematicians seldom learn 
the names of many curves. It is easier to describe a curve by an 
equation, as in analytic geometry. CHART II 
Here we draw two perpendicular 
lines, called axes, represent distances 
from these axes by z and y, and write 





POINT 
ttyter Pe 
as the equation of a circle of radius a. 
r and center at the origin—relying -—--x--> 





upon the theorem of plane geometry 
that the square on the hypotenuse 
of a right triangle is equal to the sum 
of the squares on the other two sides. 
Solved for y the equation becomes 


Je vVr—2. 


This is a frequency function, giving for sizes or measures 2 the fre- 
quency of those sizes when the distribution is circular, as in the illus- 
tration given above. The mid-ordinate or altitude of each rectangle 
reaches to the circle. In the illustration, the y itself, as an ordinate, 
gives the frequency. Very often, it is more convenient to represent 
frequency by the area beneath the frequency curve between the ordi- 
nates erected vertically at the ends of the range of sizes considered. 

I have tried to indicate in quite elementary language what a fre- 
quency function is. It is generally possible to find a rather simple 
function which will with fair approximation give observed tabular 
frequencies. This function, then, may be said to classify data into 
constituents by size. A great deal of statistical mathematics centers 
about the subject of frequency functions—some of this mathematics is 
by no means simple. In what follows, I shall have to assume that the 
reader has some knowledge of analytic geometry, algebra and calculus. 

Pretorius ' has just recently (July, 1930) given a rather comprehen- 
sive survey of frequency functions (curves) as preliminary to his dis- 
cussion of frequency surfaces. He notes in particular (p. 112): 

“‘(a) Pearson’s system of skew frequency curves derived from the 
generalized probability equation: 


ldy_ ata 
ydx bot+bir+bor?’ 


18. J. Pretorius, ‘Skew Bivariate Frequency Surfaces Examined in the Light of Numerical Illustra- 
tions,’’ Biometrika, Vol. 22, 1930, pp. 109-223. 


























Proceedings 


“‘(b) (i) Edgeworth’s generalized law of error, 
(ii) The Gram-Charlier Type A and Type B series; 
“(c) The translated, or transformed, curves of Edgeworth, and of 
Galton and MacAlister, as treated by Pearson and Wicksell.”’ 
Using the notation adopted for the Pearson Types, with u, denoting 
the r** moment, 
61 =p"3/po, Bo=p4/u"2; 


and using Pearson’s tetrachoric functions, 


r(@) = =( - ae 


Pretorius writes Edgeworth’s function in the form 


F(z) mnt ye Varret yD G—B)nt yer 


With the last term omitted, this is the Gram-Charlier Type A up to the 
moments of the fourth order only. Types A and B are expressed fully 
as infinite series, the latter being a generalization of the Poisson limit 
to the binomial. 

Reading this paper of Pretorius with its rather comprehensive bibli- 
ography, one may get into touch with a great deal of literature dealing 
with frequency functions of one and more than one variable—among 
the latter, Karl Pearson’s ' paper, which deals with the function 


ep (z2@—2rzyt+y*) /2(1—r*), 
Zo e Tr: ) r P, 


where P is a general polynomial of the fourth degree in z/o; and y/¢:. 
In spite of the great generality of Pearson’s surface with its 15 param- 
eters, Pearson (p. 309) is not wholly satisfied with it. He would 
prefer a generalization of his Types to two variables if a satisfactory 
generalization could be effected. 

A generalization for three of these Types for one variable was given 
by Romanovsky,? who multiplied these Types by a polynomial. The 
use of these generalized types required moments higher than the fourth; 
and Pearson remarks (p. 117) in an appended note: ‘‘I have a very firm 
conviction that the mathematician who uses high moments may make 
interesting contributions to mathematics, but he removes his work from 
any contact with practical statistics.’ In this connection, also, Pear- 
son mentions an unpublished generalization of the Types by Dr. David 

1 Karl Pearson, “The Fifteen Constant Bivariate Frequency Surface,” Biometrika, Vol. 17, 1925, 
pp. 268-313. 


2V. Romanovsky, “Generalization of Some Types of the Frequency Curves of Professor Pearson,” 
Biometrika, Vol. 16, 1924, pp. 106-116. 














American Statistical Association 





230 


Heron, who raised the degree of the polynomial denominator in the 
Pearson differential equation. The possibilities of this generalization 
were explored recently by Mouzon.! 

Although magnificent and far-reaching results have been obtained in 
setting up and elaborating frequency functions and families of such 
functions, we seem to be without a general method for handling all 
cases that arise, even in the case of one variable. . With regard to two 
variables, Pretorius * (p. 221) says: ‘“‘It can be said, in conclusion, that 
after more than thirty years the problem still remains the ‘most urgent 
task before mathematical statisticians.’”’ 

One of the difficulties involved can be seen as follows. Rietz * starts 
with a postulated normal distribution of diameters of spheres, ‘‘such as 
oranges on a tree or peas on a plant,” and investigates the distribution 
of volumes. The curve (pp. 295, 296) has an anomaly at the origin, 
not representable by a Pearson Type or a Charlier A-Series to fourth 
moments. 

Wicksell* studies the ‘‘corpuscle problem’? which comes up in 
biology in connection with a section of a gland which contains corpuscles 
—like the islands in the pancreas—and arises again in stellar statistics 
in interpreting the photograph—the section or projection—of the stars 
in a globular cluster; and again (Vol. 18, p. 170) in connection with 
physical chemistry, where corpuscles of cemetite (Fe;C ) imbedded in 
pure iron are dissolved at a rate depending upon the total surface of 
the corpuscles. 

The required distribution is that of the volumes or surfaces of 
spheres or ellipsoids. The observed distribution is that of circles or 
ellipses on a plane section not in general passing through the center of 
these solids. 

Rider * in a recent issue of the Annals of Mathematics exhibits the 
distributions of a great variety of statistical parameters, giving a rather 
comprehensive five-page bibliography. It would be rather surprising 
if some of these distributions—such as in (79) p. 605, involving a double 
exponential, much like a Makeham mortality curve—could be given a 
good fit by frequency curves in the systems most used. 

1 Edwin S. Mouzon, Jr., ‘‘Equimodal Frequency Distributions,’’ Annals of Mathematical Statistics, 
Vol. 1, 1930, pp. 137-158. 

2S. J. Pretorius, “‘Skew Bivariate Frequency Surfaces Examined in the Light of Numerical Illustra- 
tions,” Biometrika, Vol. 22, 1930, pp. 109-223. 

*H. L. Rietz, ‘Frequency Distributions Obtained by Certain Transformations of Normally Dis- 
tributed Variates,” Annals of Mathematics, Vol. 23, 1923, pp. 292-300. 

4S. D. Wicksell, ‘‘The Corpuscle Problem. A Mathematical Study of a Biometric Problem,” 
pt —" 1925, pp. 84-99; “Second Memoir. Case of Ellipsoidal Corpuscles,”” Vol. 18, 


§ Paul R. Rider, ‘“‘A Survey of the Theory of Small Samples,’’ Annals of Mathematics, Vol. 31, 1930, 
pp. 577-628. 








— ~> a—_st 7h 


es, leone ls: l(—s—iT/, ll ok 








> 
/ 











Proceedings 231 


Perhaps enough has been said to illustrate how easily some very 
simple frequency curve may lead to another of a very complex nature. 
In fact, almost any frequency function may be the outcome of any 
other frequency function through the use of the proper transforming 
function. 

Thus, we need not be surprised to find bimodal curves which are 
beyond the reach of the Pearson Types and curves with vertical 
tangents poorly fit by a Charlier Type A. Or, again, while a fair fit 
may be possible with one of the Types, a better fit may be obtained 
otherwise. Thus ‘Student’?! shows a rather good approximating 
curve of Type I, which approximates the actual distribution of samples 
of two—the actual distribution being the right half of a normal dis- 
tribution. Again, Irwin? for the distribution of individual differences 
after considering two families of curves commonly used, selects (pp. 
116, 117) a tail of a normal curve. 

In another paper, Irwin ® is led to results that are rather suggestive. 
For the means of samples of n, he finds (pp. 237, 238) that when n=2, 
the frequency distribution is an isosceles triangle. 


y=2{1—2|z|}, 


while for n=4, the curve is a~smooth bell-shaped curve, made up, 
however, of distinct curves with identical tangents at the points of union. 
The equation of each section involves |z|*. But |z|* is a somewhat 
peculiar function. It may be written 


|z3| =|z|-22?=a2'sgn z, 


where sgn x takes on values —1, 0, or +1 according as z is negative, 
zero or positive. This sgn x or signum z has played an important rdéle 
in the theory of probability and its applications to statistics—see, e.g., 
Czuber’s‘ treatment of ‘‘ Kollektivmasslehre” (pp. 400 seq.). And 
even in the most elementary statistics we are constantly dealing with 
the mean deviation, that is, the average value of |z|=2z sgn x where x 
is a deviation from some central value. 

In the definition of a linear correlation ratio, Pearson > makes implicit 
use of sgn(x—Z)—in the sense that the introduction of this factor 
would have enabled two sums to be written as a single sum—and a 

1“*Student,”’ ‘Errors of Routine Analysis,” Biometrika, Vol. 19, 1927, pp. 151-164. 

?J. O. Irwin, ‘The Further Study of Francis Galton’s [Individual Difference Problem,’’ Biometrika, 
Vol. 17, 1925, pp. 100-128. 

*J. O. Irwin, ‘On the Frequency Distribution of the Means of Samples from a Population Having 
Any Law of Frequency with Finite Moments, with Special Reference to Pearson's Type II,’’ Biometrika, 
Vol. 19, 1927, pp. 225-239. 

‘Emanuel Czuber, Wahrscheinlichkeitsrechnung, Vol. 1, Third Part, ‘“‘ Kollektivmasslehre.”’ 


* Karl Pearson, ‘‘On First Power Methods of Finding Correlation,”’ Biometrika, Vol. 17, 1925, pp. 
459-469, 














232 American Statistical Association 





later paper by J. R. Musselman ' elaborates this. Of course, the use of 
|z| or in general of |z|-z’-' is accompanied with some difficulties; but 
these are not always insuperable. If the distribution mentioned above 
obtained by Irwin were regarded as data to be fitted, the most perfect 
fit—coincidence—could only be obtained by the use of powers of |z|. 

In a brief note,? I suggested the use of such functions for curve fitting. 
Thus, I defined the rth ‘“‘conjugate moment” as formed by using 
x'sgn x=|z|‘x’-! in place of xz". The chief advantage lies in the fact 
that without going to moments higher than the second, we can obtain 
six moments—the zeroth, first and second and their conjugates—and 
thus avoid the third and fourth moments, which are sometimes un- 
satisfactory when a distribution has irregular ends. 

Furthermore, we may use fractional moments of two kinds employing 
|x|? and |z|?sgn x, where p need not be integral. In the paper of Rietz* 
quoted above, there was a passage from diameters to volumns, pro- 
portional to the cubes of these diameters; and in this connection we 
may note that the first moment for z is the one-third moment for 2’, 
taken about the same origin. 

Pearson’s Table of the Incomplete Gamma Function makes com- 
paratively easy the computations involved in many instances of curve- 
fitting with fractional or conjugate moments. Problems of elimination 
arise, and sometimes supplementary tables would be useful. 

The curve that I considered in my note was 


y=e*"*(A4+Br+Cx?+ ... +Kx%+), 


The alternate coefficients A, C, Z, were to be obtained from the even 
moments; the others, separately, from the odd moments. For this 
form, a natural unit for z is its standard deviation. 

Usually we would not want to use more than six moments. These 
may all be low—the zeroth, first and second, with their conjugates. A 
great many possibilities present themselves. For example, we may 
write 


sia N —z?/207 
y " [1+O(2)]. 


As the odd function O (x), we may take 
O(x2) =Az+B\z\x+Cz', 


1J. R. Musselman, ‘On the Linear Correlation Ratio in the Case of Certain Symmetrical Frequency 
Distributions,’’ Biometrika, Vol. 18, 1926, pp. 228-231. 

2 E. L. Dodd, “ The Fitting of Curves by the Use of Moments and Conjugate Moments,” Abstract in 
Bulletin of the American Mathematical Society, Vol. 31, 1925, pp. 296-297. 

°H. L. Rietz, ‘Frequency Distributions Obtained by Certain Transformations of Normally Dis- 
tributed Variates,’’ Annals of Mathematics, Vol. 23, 1923, pp. 292-300. 











of 
t 


t 


— SE eS CUleSel,lC<“ SS 








vs nbs Wie lh ainaedian 


Proceedings 233 


and make the first, the conjugate zeroth, and the conjugate second 
moment for the curve agree with those moments of the data. In 
solving for A, B and C the determinant denominator cannot vanish. 
Or, we may use in combination such odd functions as 


v", 2", |a|"“sgn x, |x|"“sgn 2; 
and get a variety of effects, some with an abruptness point. 
Again, we may write in more general fashion 


y= ee ee) +0(2)]. 


OV Zr 
As the even function E(x), we may use such expressions as 
A!+ B'z?+C'z4, A!+B'z?+C}|2|§, A!+B2|+C'z?, 


and solve for A', B', C', since the determinant involved does not vanish, 
when we use zeroth, second and conjugate firsts moments. That is, for 
observed and theoretical frequency we make equal the area or popula- 
tion, the standard deviation and also the mean deviation. With the 
Incomplete Gamma Function Table available, even more general 
functions are indicated as worthy of consideration, such as 


y=e/E(xz)+O(z)], f(x)=—-¢'|2\’. 


In case the even function E(z) is a constant, a formal solution for p is 
obtained by writing: 


v=ro=r(2)1(?)/] n(?) t=G(u); p=Glur/u4w), 


where 4,1) means the conjugate first moment. 

As noted by Pearson! (p. 459), in a related problem, difficulties arise 
in splitting up the frequency group in which the arithmetic mean lies. 
We could hardly expect |z| to be as tractable asz. However, we should 
remember that our main objective is to get a good fit, as evidenced by 
some suitable test, such as the chi-square, P test. Kirstine Smith ? 
finds (p. 264) that fitting by the chi-square condition itself generally 
leads to equations ‘‘far too involved to be directly solved,” although 
good approximations are often available. It would appear rather 
difficult to prove that any particular scheme of fitting by moments 


1 Karl Pearson, “On First Power Methods of Finding Correlation,’ Biometrika, Vol. 17, 1925, pp. 
459-469. 

? Kirstine Smith, ‘On the ‘Best’ Values of the Constants in Frequency Distributions,’ Biometrika, 
Vol. 11, 1915-1917, pp. 262-276. 

















234 American Statistical Association 


would always be the best. Perhaps for a distribution with extremely 
ragged ends, rather low moments would be preferable. 

But the use of integral moments up to the fourth has met with such 
success over such a wide range of material that it is likely to remain the 
most satisfactory method in general. 

One of the beauties of the Pearson method is its classification of 
frequency distributions into J or bell shaped, J-shaped and U-shaped 
distributions, a chart giving indications of suitability. 

Beyond the horizon of actual curve-fitting with its comparatively 
low moments lies a field of considerable theoretic interest. A most 
valuable series of monographs, Traité du Calcul des Probabilités et de 
ses Applications, is coming through the press under the editorship of 
Emile Borel.! We find, for example in Tome I, Fascicule I, p. 121, a 
simple function 





f(x) =e~sin t, t=~/z, 2x20, 


with all its positive integral moments equal to zero. By means of this, 
a statement sometimes made that a frequency distribution is deter- 
mined by its moments, is shown to be untenable, without some 
qualification. 

Castelnuovo ? in an excellent journal which has just made its appear- 
ance this year, with F. P. Cantelli as editor, brings the “problem of 
moments’”’ up to date. 


1 Emile Borel, Traité du Calcul des Probabilités et de ses Applications, Avec la Collaboration de L. Blar- 
inghem, C. V. L. Charlier, R. Deltheil, H. Galbrun, J. Haag, R. Lagrange, F. Perrin, P. Traynard. 

2 Guido Castelnuovo, ‘Sul problema dei momenti,’’ Giornale del’ Istituto Italiano degli Attuari, Vol. 
1, 1930, pp. 137-169. 











—- SS a hU mw 


ae 





Proceedings 235 


THE AGRICULTURAL SITUATION AND ITS EFFECT ON 
BUSINESS IN 1931 


By Louis H. BEAN 


The single outstanding fact in the agricultural situation of 1929 and 
1930, not even excepting the drouth, is the business depression. The 
abrupt change from a period of prosperity to one of deep depression is 
now reflected in prices received by producers lower than any in many 
years, and also in a drop in income of major proportions. In analyzing 
the 1931 prospects for agriculture it is therefore necessary to take into 
account the probable course of business during 1931 as affecting the 
industrial demand for certain products and the consumer demand for 
food products. But in appraising that prospective demand we are 
faced with the fact that agriculture still plays an important rdéle in 
business conditions, and we, therefore, are forced to take into account 
certain influences that the present agricultural situation may exert 
on business in 1931. 

For the purposes of this limited discussion, the complex agricultural 
situation in the United States can best be described in terms of agri- 
cultural production, prices and income, and its widely ramifying 
effects on business in 1931 can also be most clearly visualized if we 
attempt to trace a few of the respective influences that may arise 
from the current trends in farm prices, production and income. 

The 1930 business depression has in the course of a year wiped out 
all of the agricultural price advances that had been made since the 
former great depression in 1921. By the summer of 1921, prices re- 
ceived by producers had dropped to a level only 10 per cent above pre- 
war prices, but in the current depression they have declined to a level 
only 3 per cent above pre-war averages. As of November 15, 1930, 
the general average of prices received by farmers was nearly 25 per cent 
below that of November, 1929. Practically every section of agricul- 
ture has felt this price debacle. Cotton growers have seen their prices 
decline more than 40 per cent; wool prices have dropped 33 per cent; 
grain prices, 32 per cent; meat animal prices, 18 per cent; and prices of 
dairy and poultry products, 18 per cent. Only a small part of these 
declines can be attributed to increased production; the major cause lies 
outside American agriculture. As in the price deflation of 1920-1921, 
our international farm products show in the extent of their declines the 
combined influences of the domestic and international business situa- 
tion. This is revealed in the fact that the greatest declines in both 














American Statistical Association 





236 


periods were experienced by cotton, wool and grains, and the smallest 
by meats and dairy and poultry products. 


TABLE I 
DECREASES IN PRICES RECEIVED BY PRODUCERS IN THE UNITED STATES 
BETWEEN NOVEMBER, 1929 AND 1930, AND BETWEEN 
JUNE, 1920 AND 1921 














Commodity November, 1929-1930 June, 1920-1921 
Per cent Per cent 
SEES SEER SRM nee Cee a ne ee ee a 41 74 
aie Si ge are ain in alin wks Raa ee 33 60 
ee ie eden en ee Ge hin Nick eee eee eweN 32 59 
ek se eiwidnd cae Ga ehekwn ke éienmaen need 27 38 
PE Co cineecs5e6esibbecancanecenen asians 18 42 
Ora ene DU wane ae he 13 27 
ER NORE ee Or a RT OT ET eR CEO EN 24 53 




















In the current record of crop production, the outstanding feature is 
the great reduction in the two feed crops, corn and hay, brought about 
by the 1930 drouth. The total acreage in field crops harvested in 
1930—367,000,000 acres—was only one-half per cent greater than last 
year’s acreage, but yields were 5.4 per cent below the low yields of 1929 
and 8.9 per cent below average yields during the past 10 years, though 
not quite so low as those of 1921. In consequence of these low yields 
the aggregate production of feed crops in 1930 was 5 per cent below 
that of 1929. In the major cash crops, wheat showed an increase of 
5 per cent, cotton a decrease of 4 per cent, potatoes and tobacco prac- 
tically unchanged, while most of the fruit crops were larger than the 
unusually low production of 1929. 

The production of live stock and live stock products also shows no 
great changes of marked significance. The production of cattle, 
judging from Federal Inspected slaughter, was in 1930 practically the 
same as in 1929, but hog slaughter was noticeably reduced. The pro- 
duction of milk has been maintained at the 1929 level exceptin the 
three summer months of the drouth, and the output of poultry and 
poultry products has also been maintained close to the 1929 output. 
From the standpoint of supply, the agricultural situation contains in it 
no general excessive production, but rather an accumulation of market 
stocks in the two major cash crops, cotton and wheat, due to the falling 
off in export demand and to the domestic business depression. 

The combination of falling prices and reduced output has again so 
reduced the farm income of 1930 as to wipe out all net returns to farm 
capital. Taking into account prices received so far this season and 
their relation to average prices for the entire season’s marketings, 
the following reductions in gross income have taken place: grains, 








~_ a—_— Gr _~ Lm 


~_ ee es -~ 


am ta @e866. ft 2026 lhClUelCUoe CUuetlC(<itK CO lO 














Proceedings 237 


$375,000,000, mostly on wheat; fruits and vegetables, $320,000,000; 
cotton and cottonseed, $600,000,000; live stock, $600,000,000; and 
dairy and poultry products, nearly $500,000,000. On all crops com- 
bined the reduction in gross income totals a billion dollars and on live 
stock and live stock products, $1,400,000,000. Not since the deflation 
of 1919-1921 has the annual farm income been so widely and so greatly 
reduced. Theaggregate gross income from the farm production of 1930 
is in round numbers down to $9,430,000,000, which is $2,400,000.000 
less than in 1929, or a reduction of 20 per cent to a level only 
slightly above the gross income of the worst year of the post-war de- 
pression, namely 1921-1922. In magnitude, absolute or percentage, 
this drop in the purchasing power of farmers in the United States is 
almost identical with the drop of over two billion dollars, or 20 per 
cent, in the money incomes of factory employees. The financial bear- 
ing of this enormous reduction in gross income may be seen in the fact 
that the net income available for all agricultural capital and manage- 
ment in recent years has averaged about $2,200,000,000. In other 
words, the financial results of the 1930 farm production leave no net 
income except that which may have been made possible by somewhat 
lower costs of labor and farm supplies. Such are some of the facts in 
the present agricultural situation. 

What effects on business in 1931 may be expected from anagricultural 
situation characterized by unusually low prices being received by 


TABLE II 


GROSS INCOME FROM FARM PRODUCTION BY GROUPS OF 
COMMODITIES, 1924-1930 





























1924 1925 1926 1927 1928 1929 1930 
Source of income 
Million | Million | Million | Million | Million | Million | Million 
dollars | dollars | dollars | dollars | dollars | dollars | dollars 
Crops: 
oe wie aii 1,755 1,496 1,432 1,592 1,513 1,285 910 
Fruits and nuts............. 671 683 694 690 705 727 566 
ee ina ahead 953 1,193 1,093 1,062 967 1,162 1,015 
Sa 104 95 103 104 92 100 105 
Cotton and cottonseed. ...... 1,710 1,740 1,251 1,464 1,470 1,376 779 
DE ic ckeathtaeocsaxa 259 251 237 257 278 286 211 
ic dcunceccunwaes 719 689 659 649 650 667 640 
ne 6,170 6,147 5,468 5,817 5,675 5,603 | 4,226 
ive stask and live stock pro- 
ucts: 
Cattle, hogs and sheep........ 2,380 2,822 2,922 2,664 2,727 2,827 | 2,280 
Poultry tie tanne es 989 1,114 1,167 1,108 1,202 1,256 1,024 
Sys xernenenie 1,678 1,759 1,805 1,911 1,994 2,045 1,810 
DLL vebwdaawesetewweee 87 97 88 86 lll 93 67 
EASED: 33 28 30 30 32 28 27 
Total live stock........... 5,167 5,820 6,012 5,799 6,066 6,249 | 5,208 
ere 11,337 11,968 11,480 11,616 11,741 11,851 9,434 






































American Statistical Association 





238 


producers, by no over-production, by large stocks only of cotton and 
wheat and by a drastically reduced money income? 

The effect of the reduced buying power of the farmer on business 
in 1931 is of course obvious. Farmers will not buy as much fertilizer, 
as many automobiles, as much farm machinery in the first half of 1931 
as they did in the first half of 1930, nor will they spend as much on 
clothing and household things. They might, however, buy as much in 
quantity as formerly, provided prices at country stores are reduced as 
much as farm incomes have suffered, or provided sufficient credit is 
advanced. During the second half of 1931 the buying power of the 
farmer will depend more on the out-turn and prices of the 1931 produc- 
tion than on the present depleted incomes. In the drouth-stricken 
areas, however, it may take all of the 1931 incomes to pay up debts and 
restore farm flocks and herds, and the purchases of industrial goods 
might have to wait until 1932. The present widespread reduction in 
the farmers’ money income and in their ability to command credit 
makes it quite unlikely that business as a whole will receive any notice- 
able stimulus from those industries which depend on farmers as cus- 
tomers. 

The price situation, however, does have at least some germs of pos- 
sible favorable influence on business in 1931. The price of cotton has 
gone so low that it has already stimulated an advance in the consump- 
tion of cotton by domestic mills, while industrial activity in general 
has continued to decline. Similar low prices of raw cotton in the fall 
of 1920 and in 1926 made it profitable for mills both here and abroad to 
expand their activity, as may be seen from Chart I. Cotton con- 
sumption in the first half of 1921 rose from the extremely low level of 
the preceding winter approximately to normal levels while industrial 
activity still continued downward. Again in 1927 cotton consumption 
reached record levels while industrial activity continued to decline. 
In both of these instances the large supplies of cotton at low prices pre- 
vented business in general from going lower and helped to lay the 
foundation for recovery. A development similar to these in 1931 
would mean increased employment and earnings to a large number of 
factory workers particularly in the South, and its cumulative effects on 
allied industries might again be one of the factors leading to a general 
revival in business. 

Incidentally we have here one of the clearest examples of the interre- 
lations between agriculture and business. The sequence, starting with 
a decline in business, appears to be as follows: Reduced demand for 
cotton goods, curtailed mill activity, raw cotton prices falling to a level 
where the spread between prices of the manufactured products and the 








a 64 &. & 


eS 


Ss OM CO ms oC SS 





oni SS MM RE OS-lellO—“‘éN 





Proceedings 239 
CHART I 


Cotton Consumption and Industrial 
Production in the U.S., 1919-1930 




























PER 
CENT 
1923-1925=100 PER — 
120 = Cotton consumptio ) 
* ral 
he 
. . 
LY) 





100; 











80h 


Industrial production 





60 






































2 oe ee ae a ee ee ee oe 
1920 1922 1924 1926 1928 1930 





price of the raw material represents a profit, and finally an expansion in 
mill activity and increased demand. In the meantime another se- 
quence goes on, involving the farmer, as follows: Low prices and re- 
duced incomes lead to restricted credit for production, to curtailed pur- 
chases of fertilizer and to reduced acreage planted in the following 
spring. If not offset by good yields, a smaller crop in the fall, together 
with continued active mill consumption, depletes supplies and stim- 
ulates prices. 

Another germ of favorable potentialities for business in 1931 lies 
in the lower prices of food, which partly offset the reduced incomes of 
urban consumers and are making available a relatively larger portion of 
wage earnings for industrial goods. Judging from Chart II, which is a 
record of the alternate periods of high and low agricultural prices since 
1875, we had in 1927-1928 and 1929 a period of relatively high agricul- 
tural prices (at wholesale markets) as contrasted with the low prices 
of 1921-22-23. In the latter period a larger share of consumers’ earn- 
ings went for food than in the earlier period and probably weakened the 
market for industrial goods. And it may easily be that this was one of 
the underlying factors which helped to terminate the recent prosperity. 
For we find that after previous periods of relatively high agricultural 
prices business depressions ensued. As a matter of fact, had more 
prognosticators dared to follow the implications of this record courage- 
ously in 1928, the depression of 1930 and its continuance well into 1931 
might have been more clearly foreseen. Now that agricultural prices 





238 American Statistical Association 


producers, by no over-production, by large stocks only of cotton and 
wheat and by a drastically reduced money income? 

The effect of the reduced buying power of the farmer on business 
in 1931 is of course obvious. Farmers will not buy as much fertilizer, 
as many automobiles, as much farm machinery in the first half of 1931 
as they did in the first half of 1930, nor will they spend as much on 
clothing and household things. They might, however, buy as much in 
quantity as formerly, provided prices at country stores are reduced as 
much as farm incomes have suffered, or provided sufficient credit is 
advanced. During the second half of 1931 the buying power of the 
farmer will depend more on the out-turn and prices of the 1931 produc- 
tion than on the present depleted incomes. In the drouth-stricken 
areas, however, it may take all of the 1931 incomes to pay up debts and 
restore farm flocks and herds, and the purchases of industrial goods 
might have to wait until 1932. The present widespreaél reduction in 
the farmers’ money income and in their ability to command credit 
makes it quite unlikely that business as a whole will receive any notice- 
able stimulus from those industries which depend on farmers as cus- 
tomers. 

The price situation, however, does have at least some germs of pos- 
sible favorable influence on business in 1931. The price of cotton has 
gone so low that it has already stimulated an advance in the consump- 
tion of cotton by domestic mills, while industrial activity in general 
has continued to decline. Similar low prices of raw cotton in the fall 
of 1920 and in 1926 made it profitable for mills both here and abroad to 
expand their activity, as may be seen from Chart I. Cotton con- 
sumption in the first half of 1921 rose from the extremely low level of 
the preceding winter approximately to normal levels while industria! 
activity still continued downward. Again in 1927 cotton consumption 
reached record levels while industrial activity continued to decline. 
In both of these instances the large supplies of cotton at low prices pre- 
vented business in general from going lower and helped to lay the 
foundation for recovery. A development similar to these in 1931 
would mean increased employment and earnings to a large number of 
factory workers particularly in the South, and its cumulative effects on 
allied industries might again be one of the factors leading to a general 
revival in business. 

Incidentally we have here one of the clearest examples of the interre- 
lations between agriculture and business. The sequence, starting with 
a decline in business, appears to be as follows: Reduced demand for 
cotton goods, curtailed mill activity, raw cotton prices falling to a level 
where the spread between prices of the manufactured products and the 





Proceedings 
CHART I 


Cotton Consumption and Industrial 
Production in the U.S., 1919-1930 





1923-1925=100 PER — | 











= Cotton consumption, 






























































sasbestealastastasseabsrtestastostsrpealsrtsterpralsersalrerselestnabeors etprlerpastertrobertedasrebentspbepiepbarsncd 
1920 1922 1924 1926 1928 1930 





price of the raw material represents a profit, and finally an expansion in 
mill activity and increased demand. In the meantime another se- 
quence goes on, involving the farmer, as follows: Low prices and re- 
duced incomes lead to restricted credit for production, to curtailed pur- 
chases of fertilizer and to reduced acreage planted in the following 
spring. If not offset by good yields, a smaller crop in the fall, together 
with continued active mill consumption, depletes supplies and stim- 
ulates prices. 

Another germ of favorable potentialities for business in 1931 lies 
in the lower prices of food, which partly offset the reduced incomes of 
urban consumers and are making available a relatively larger portion of 
wage earnings for industrial goods. Judging from Chart II, which is a 
record of the alternate periods of high and low agricultural prices since 
1875, we had in 1927-1928 and 1929 a period of relatively high agricul- 
tural prices (at wholesale markets) as contrasted with the low prices 
of 1921-22-23. In the latter period a larger share of consumers’ earn- 
ings went for food than in the earlier period and probably weakened the 
market for industrial goods. And it may easily be that this was one of 
the underlying factors which helped to terminate the recent prosperity. 
For we find that after previous periods of relatively high agricultural 
prices business depressions ensued. As a matter of fact, had more 
prognosticators dared to follow the implications of this record courage- 
ously in 1928, the depression of 1930 and its continuance well into 1931 
might have been more clearly foreseen. Now that agricultural prices 





American Statistical Association 
CHART II 
AGRICULTURAL PRICE CYCLES AND BUSINESS CYCLES 


RATIO OF FARM 
NONAGRICULTURAL =100 


1915 1920 1925 1930 


URE NY OF AGMCUL TURAL (COMDNNCS 


have declined relatively to industrial prices and given amounts of food 
and clothing can be bought for less money, we may infer that this shift 
portends a more favorable demand for industrial goods, particularly on 
the part of the large number whose incomes have not been materially 
affected by the depression. 

There is still another way in which low agricultural prices might 
affect business activity. Ordinarily, unusually low agricultural prices 
might be expected to exert an unfavorable influence on subsequent 
business conditions through the curtailment of acreage and live stock 
production, but the prospects for 1931 may be somewhat different, 
as will be pointed out presently. 

The effect of current agricultural production is to sustain activity 
among those industries that depend on the handling or processing of 
farm products. It has already been shown that the physical volume of 
crop production is only 5 per cent below that of 1929 and this reduction 
is chiefly in corn and hay, which affect business activity mostly through 
the marketing of live stock. If we consider only the net agricultural 
output for 1930, that is, only the volume which is marketed and con- 
sumed in the farm home, it appears that the aggregate net production 
in 1930 was only 2 per cent less than in 1929. A somewhat smaller net 
output of grains, cotton, live stock and hay was nearly offset by in- 
creased net production of fruits, vegetables and wool. A 2 per cent 
reduction in farm output is obviously very small compared with the 
wide fluctuations in industrial activity. 

This characteristic of the total volume of agricultural output— 
namely, that it fluctuates much less than the total volume of industrial 
production—serves as a stabilizing influence on the nation’s business. 





Proceedings 241 


Furthermore, in the past 12 years crop and live stock marketings have 
chanced to be above normal in years when factory production was below 
normal, and vice versa. Particularly, in 1921 and 1922 agriculture 
contributed large physical volume to industrial activity, although its 
purchasing power was greatly reduced. This stabilizing influence may 
be illustrated by the course of production in factories using agricultural 
and non-agricultural materials. If we group the Federal Reserve Board 
index of factory production into two major groups, placing in the first 
foods, textiles, tobacco and leather products and all the rest in the 
second, it appears that since 1920 the fluctuations in production of 
factories using industrial raw materials have been greater than those of 
factories using agricultural raw materials. The latter tended to pre- 
cede the changes in the industrial group between 1919 and 1924 but 
not thereafter. In 1921 and in 1927, factories using agricultural prod- 
ucts were sustaining elements in the general level of business, for their 
output tended to offset the declining output of other industries. For 
example, from the high point of January, 1920, to the low of July, 1921, 
the index of industrial factory production declined 45 per cent, but the 
total including the agricultural group declined 33 per cent. In the 
present depression, the industrial group declined 40 per cent between 
June, 1929, and November, 1930, while the total including the agricul- 
tural group declined 33 per cent. 


CHART III 


INDEXES OF PRODUCTION IN FACTORIES USING AGRICULTURAL’ AND 
NONAGRICULTURAL’ MATERIALS 


PER CENT 


1923-1925 =100 NONAGRICU an 
? 


AGRICULTURAL i} 
4 /— 
- af ‘| ; 


JAN. JULY JAN. JULY JAN. JULY JAN. JULY JAN. JULY JAN. JULY JAN. JULY JAN. JULY JAN. JULY JAN. JULY JAN. JULY JAN. JULY JAN. JULY JAN JULY JAN 
1919 1920 192) 1922 1923 19246 1925 1926 1927 1928 1929 1930 1931 1932 
© AGRICULTURAL © FEDERAL RESERVE BOARD GROUPS OF FOODS, TEXTILES. TOBACCO. LEATHER PRODUCTS TO WHICH MAS BEEN ADDED AN INDEX 
OF CREAMERY BUTTER PRODUCTION. 
© NONAGRICULTURAL® ALL OTNER GROUPS OF TNE FEDERAL RESERVE BOARD INDEX OF MANUFACTURES (1) ON AND STEEL; AUTOMOBILES. 


US DEPARTMENT CF AGRICUL TURE BAL OF mn tn ECONDRETS 





242 American Statistical Association 


In considering further probable effects of agricultural production 
on business in 1931, crop prospects for 1931 need to be taken into 
account. Ordinarily attempts to predict aggregate crop production in 
December of the preceding year would be a most risky venture and it is 
so even this year. 

But one cannot be oblivious to the following important facts which 
are likely to affect the production of 1931. Prices of live stock and 
live stock products have suffered, but prices of feeds are low. This 
suggests the possibility of continued expansion in dairy and poultry 
products and in meat animals. In spite of the present very low prices 
of wheat, which are generally considered ruinous to many producers, 
the winter wheat acreage for harvest in 1931 has not been materially 
reduced. In the case of potatoes, prices received for the 1930 crop are 
sufficient to induce expansion in 1931, particularly in the South. Any 
shift that might take place in cotton acreage is likely to show up in 
increased acreages of other crops. Furthermore, the aggregate acreage 
in all crops does not vary very much from year to year. Most of the 
variations in the production of all crops combined, therefore, reflect 
changes in yields per acre. Now the chances are fairly certain that 
yields per acre, following a year of unusual drouth, may show con- 
siderable improvement and if such should transpire in 1931, it would 


be almost a repetition of what happened in the comparable phases of 
the previous major depressions of 1920-21-22, 1913-14-15, 1907-08-09 


CHART IV 
FARM PRICES or FARM PRODUCTS 


INDEX OF RETAIL PRICES oF COMMODITIES FARMERS BUY 
RETAIL PRICES OF COMMOOMES FARMERS BUY 
' , ed | | 
pace 


uy 


LION BIOS BR BOT RIOT BB. PRS BR POS BR PBT BR BEE LB PBS DES IO OS IO DEN I BRN IO BOS BR IRS DR PBS Hl. PRT 30D 


a RAN 
“910 1948 1912 1913 191% 191S 1916 1917 1916 1919 1920 192!) 1922 1923 192% 1925 1926 1927 1928 1929 1930 
6b CDRA OF sce Ne GAREAL OF AOC Tue, SCLOMCREC 





Proceedings 243 


and 1893-94-95. In 1922, the aggregate volume of crop production 
increased 10 per cent over the reduced production of 1921. In 1915, 
the increase was 8 per cent following an increase in 1914. In 1909, pro- 
duction remained at the high level of 1908, and in 1895 the increase 
over 1894 amounted to 14 per cent, or an average of 8 per cent for the 
four years. 

Such an increase in crop production in 1931 would certainly be a 
favorable factor for those industries interested chiefly in large volume, 
but the effect on other enterprises would, of course, hinge on the course 
of prices and the income derived from the larger volume, which in turn 
will be governed to some extent by the buying power of consumers next 
fall and by the conditions of foreign demand. And thus we return to 
the circular relation that exists between agricultural prospects and 
business prospects. 

In concluding this brief paper on a wide subject it would be desirable 
to give a summary conclusion as to the probable net influence of the 
various agricultural elements in the business developments for 1931, 
but this would call for a proper appraisal of the several factors and as- 
signing to each their relative importance, which has not been done. 
The most that may be suggested is that during the first half of 1931 
business activity will continue to feel the adverse effects of the reduced 


buying power of consumers and that this is likely to give way to favor- 
able influences arising from the farm production of 1931. 





244 American Statistical Association 


A MONTHLY INDEX NUMBER OF WHOLESALE PRICES 
IN THE UNITED STATES FOR 135 YEARS 


By Greorce F. WARREN AND FRANK A. PEARSON 


No monthly index numbers of wholesale prices for a large group of 
commodities are available before 1890. Many students have prepared 
index numbers for certain periods, but no consistent series has ever 
been prepared for a long period. Roelse' published an unweighted 
yearly index for the period, 1791 to 1801. Smith? prepared a monthly 
unweighted geometric index number of 33 prices at Boston from 1795 
to 1824. Hansen* prepared an unweighted yearly index number for 
the years 1801 to 1840. Cole‘ published an unweighted monthly 
index number of prices of 38 commodities for the years 1825 to 1845, 
and an unweighted index for 32 commodities for the years 1843 to 
1862. Smith and Montgomery® published a weighted quarterly index 
number of prices of 38 commodities covering the period 1859 to 1891. 
Mitchell® published a quarterly index number of the median prices of 
92 commodities from 1860 to 1880. Snider’ published both weighted 
and unweighted quarterly index numbers for the period 1866 to 1891. 
Miss Bacon® prepared a monthly index number of prices of 106 com- 
modities from 1890 to 1900. The prices were weighted according to 
their exchange value in 1909. Snyder® prepared an unweighted 
monthly index number of 14 important basic commodities from 1860 
to 1926. 

The Aldrich report” includes three index numbers. All are for 
January only. One is a simple average for 223 commodities. Many 
minor items are included; for example, there are 25 series for pocket 

1H. V. Roelse, ‘‘ Wholesale Prices in the United States, 1791 to 1801," Quarterly Publications of the 


American Statistical Association, Vol. 15, New Series, No. 120, December, 1917, pp. 840-846. 

2W. B. Smith, “Wholesale Commodity Prices in the United States, 1795-1824,’ The Review of 
Economic Statistics, Vol. 9, No. 4, October, 1927, p. 183. 

3A. H. Hansen, Wholesale Prices in the United States, 1801-1840, United States Bureau of Labor 
Statistics, Bulletin 367, Appendix F, January, 1925, p. 236. 

4A. H. Cole, ‘‘ Wholesale Prices in the United States, 1825-1845,’’ The Review of Economic Statistics, 
Vol. 8, No. 2, April, 1926, p. 76; ‘‘ Wholesale Commodity Prices in the United States, 1843-1862," The 
Review of Economic Statistics, Vol. 11, No. 1, February, 1929, p. 31. 

5J. G. Smith and D. E. Montgomery, “An Aggregative Index of Wholesale Prices, 1859-1866,” 
The Review of Economic Statistics, Vol. 7, No. 1, January, 1925, p. 40. 

*W.C. Mitchell, Gold Prices and Wages Under the Greenback Standard, 1908, p. 23. 

7 J. L. Snider, ‘‘ Wholesale Prices in the United States, 1866-1891,’’ The Review of Economic Statistics, 
Vol. 6, No. 2, April, 1924, p. 112. 

*D. C. Bacon, “A Monthly Index of Commodity Prices, 1890-1900,” The Review of Economic Sta- 
tistics, Vol. 8, No. 4, October, 1926, pp. 177-183. 

*C. Snyder, Business Cycles and Business Measurement, 1927; p. 288. 

10 ‘*Wholesale Prices, Wages and Transportation,’’ Report by Mr. Aldrich from the Committee on 
Finance, 52nd Congress, 2d Session, Report 1394, March, 1893, Part I, p. 93. 





Proceedings 245 


knives, giving knives a weight of 11 per cent in the final index. Pocket 
knives, glass, tubs and pails have a weight of 21 per cent in the final 
index, but all farm products have a weight of only 1 per cent. The 
inclusion of these and many more series of prices that fluctuate little 
reduces the flexibility of the index. Another Aldrich index assumes 
31 per cent of the commodities to be constant in price. This is, of 
course, unsuited for any kind of use. The third Aldrich index is 
weighted, but 60 per cent of the index is food and no farm products 
are included except as they are included under food. This is un- 
questionably the best one of the Aldrich index numbers. 

MeNeil! calculated annual index numbers based on quarterly prices 
given in the Aldrich report. The number of quotations in the different 
groups varied from year to year. For instance, in 1864, when drugs 
and chemicals were high in price, they had a weight of 6 per cent; 
household furnishings, 8 per cent; and farm products, only 9 per cent. 
For 1891, textiles had a weight of 28 per cent and farm products, only 6. 
Over a long series of years, the groups have had different secular trends. 
Group weightings are, therefore, important. 

The Bureau of Labor Statistics now publishes weighted monthly 
index numbers of wholesale prices from 1890 to date. 

Many attempts have been made to connect these odd series in order 
to make a consistent index. 

Considering the wide-spread use that is being made of such index 
numbers, it was felt that a much better index should be prepared. 
The following work was, therefore, undertaken by going back to original 
sources for monthly prices. 

The aim has been to carry back to 1797 each of the 10 groups as 
now published by the Bureau of Labor Statistics, and an average for all 
groups consistent with their all commodity index. 

Most of the prices were obtained from the Shipping and Commercial 
Lists and New York Price Current, New York Spectator, United States 
Treasury Report for 1863, and the Cincinnati Price Current Reporter. 
Several series were furnished by Carl Snyder and by A. H. Hansen. 

Most of the prices are for New York City. Some are for Boston, 
Chicago and Cincinnati. 

Price for milk, corn, hogs, beans, butter, cheese, fowls, eggs, hay, 
cotton and potatoes, are averages for the high and low for at least 8 
days in the month. Prices in the Treasury Report for 1863 and the 
New York Produce Exchange Reports are averages of high and low for 
the month. The rest of the prices are averages of the high and low for 


1 Wholesale Prices, 1890 to 1914, United States Bureau of Labor Statistics, Bulletin 181, October, 
1915, p. 266. 





246 American Statistical Association 


the fifteenth of the month, or for a day as close to the fifteenth as 
possible. 

Rough checking for consistency was done in all cases. If violent 
changes occurred, the quotations were verified by comparison with 
other days or other sources of information. 

The index number does not include so high a percentage of highly 
manufactured products as does the index prepared by the Bureau of 
Labor Statistics. Therefore, it is probably more flexible. 

Few of the index numbers previously published for the period prior 
to 1890 have a weighting that is at all in accordance with the weighting 
that the Bureau of Labor Statistics uses. 

The commodities included in the unweighted index number prepared 
by Cole for the period 1843 to 1862, result in group weightings some- 
what comparable with the Bureau of Labor Statistics except that 
chemicals have a high weight and no textiles were included. 


TABLE I 
WEIGHTINGS OF VARIOUS INDEX NUMBERS 








Bureau 
of 
Labor 
Statistics 





1909 1926 





26 21 
23 
4 


of 


Hides and leather 
Textiles..... 

Fuel and lighting 

Metal and metal prod- 


8 
16 
13 


. — 
- do GO NRNO 


22 
7 





2 
2 
6 


100 100 | 100 100 100 
































* An aggregative index. Figures here given are the results for 1879-1880. 
+ Approximate weightings. These vary in different periods. 


During the Civil War period, the great scarcity of cotton made it 
rise so high in price as to distort any index number. For example, 
in August, 1864, cotton rose to fourteen times the pre-war level. 
The farm product index including cotton with a weight of 14 was 326. 
The index with cotton weighted in proportion to consumption was 197. 

For the years 1861 to September, 1871, the weighting of cotton in 
the farm-products group and for cotton goods in the textile and house- 
furnishings groups was reduced to correspond to the actual consump- 
tion of cotton. 





Proceedings 


TABLE II 
NUMBER OF QUOTATIONS INCLUDED IN THE INDEX NUMBERS 








Total number of products 
including duplicates 





























The number of commodities at different periods is shown in Table II. 

All index numbers were calculated with a base period of 192 months, 
1876 to 1891. Index numbers were calculated through 1893. 

The Bureau of Labor index numbers from 1890-1930 were con- 
verted to a 1910-1914 base. The two index numbers were then 
connected by using the overlapping period of 48 months, 1890-1893, 
and thereby converting the entire series to a 1910-1914 base. 

Announcement will be made in a later number of this JouRNAL 
as to the time and place of publication of the final index numbers. 


In England, in the Napoleonic War period, prices rose to an index 
of 227 for 1810. Twelve years later, they fell to 122, or declined 
nearly one-half. If monthly index numbers were available, the de- 
cline would doubtless be more than one-half. The index here pre- 
sented for American prices in the Civil War Period shows that in 
September, 1864, prices had risen to 240. Twelve years later, they 
stood at 119, or had declined one-half. In the World War period, 
prices rose to 244 in May, 1920. In less than eleven years, they 
had fallen to 117. Chart I shows the comparison of the inflation and 
deflation periods of the World War and Civil War periods. 

The decline in England after the Napoleonic Wars lasted thirty-nine 
years. In America, it began two years later and lasted twenty-nine 
years. The decline in America and in England after the Civil War, 
lasted thirty-two years until new gold mines were discovered. 

After sixteen years of falling prices in America, prices were tempo- 
rarily higher from 1835 to 1840. Also, after fourteen years of falling 
prices, there was a temporary rise from 1880 to 1884. But in neither 


1G. F. Warren and F. A. Pearson, The Agricultural Situation, August, 1924, p. 263. 





248 American Statistical Association 


CHART I 
WHOLESALE PRICES IN THE UNITED STATES WORLD WAR AND CIVIL WAR PERIODS* 


2467 
al ! J i qn wer 


150 vn we 


World war 














100 








= 























S50l_,4—1 1 1 Ril ee 
4860 1864 1864 7870 187A 
1914 1918 1920 1926 1930 


* There is great similarity in the declines after each War except that the two major declines were 
more violent this time. The writers expect the similarity to continue. 


case was the rise permanent until new gold was found. In each of 
these cases, the secondary rise came after liquidation of the building 
boom. If previous experiences are repeated, as seems probable, the 
long years of price decline will be broken by some rise in prices in 
about five years, but no permanent recovery is to be expected until new 
gold is found. 

The panics of 1825 and 1873 were similar to the present one in that 
city real estate had to be liquidated to a new cost basis. This is a slow 
process. 

When inflation occurs, wages lag. Food rises in price. The cost 
of distribution remains low; hence farmers receive an unusually large 
proportion of an unusually high price. This results in a low demand 
for buildings and little construction. 

When deflation occurs, wages continue to rise for some time. Costs 
of distributing food remain high. Hence farmers receive an abnormally 
small percentage of the falling retail prices. The abnormally small 
transfers of money from cities to farmers result in an abnormal demand 
for buildings following abnormally low construction. 

A period of financial inflation and deflation inevitably results in an 
agricultural depression and a building boom.' Over-building in- 
evitably results in depression. No striking change has taken place in 
the last century in the time required to start such a boom, or the time 
necessary to overdo it. Following the War of 1812, the building 
boom broke eleven years after deflation began. Following the Civil 
War, and the World War, it was, in each case, nine years from the time 
deflation began until the building boom broke. Buildings in cities 
cannot be increased or decreased quickly. If too many are made, it 
takes several years for population to grow up to them. 


1G. F. Warren and F. A. Pearson, The Agricultural Situation, August, 1924, pp. 27, 29, 242 and 270. 





Proceedings 249 


Following the panic of 1825, mild recovery came in two years, but 
real recovery came in five years. Following the panic of 1873, an 
extremely prosperous period began six years later. 

Some increase in business activity is generally forecasted for this 
spring, but this is likely to be very moderate. A period of very active 
business is to be expected about 1935. 

Present indications are that the trend of prices will be down until 
new gold mines are found. We may also expect many of the economic 
and political disturbances that occurred in the nineties to be repeated. 
Also, there is danger of vigorous agitation for a money less stable than 
gold. The only alternative is the adoption of some kind of stable 
measure of value, or the discovery of gold. Those who wish for social 
stability should be the ones who are most interested in this problem. 
Nothing so undermines all standards as the agricultural distress, 
enormous business losses and drastic unemployment that result from 
such terrible declines in commodity prices as occurred in 1920-1921 
and in 1929-1930. 

The present efforts to relieve unemployment by public construction 
are a new element in the situation. Such efforts are desirable and 
will make conditions better for a time than they otherwise would be, 
but the injurious effects of a long period of falling prices cannot be 
cured permanently in this way. 





American Statistical Association 


CORRELATION AND ASSOCIATION 


By Epwin B. WILson 


Correlation, correlated and derivatives are words used technically 
both in the general sense implying merely some mutual relationship— 
apparent rather than necessarily causal—between variables or attri- 
butes, and in the special sense involving the numerical determination of 
one or more correlation coefficients. It might be better to use associa- 
tion in the former case, reserving correlation for the latter, and this 
distinction will here be preserved. 

Association.—If two attributes or characters be tabulated for a 
population of N objects according to their possession by each of the 
objects, one will inevitably have a four-fold classification of the popu- 
lation N: 

(1) a objects with attribute I and attribute II, 

(2) 8 objects with attribute I but without II, 

(3) y objects without attribute I but with II, 

(4) 6 objects without I and without II, 
with a+8+7+5=N, the four classes being mutually exclusive but 
exhaustive. The tabulation may be arranged as a four-fold table: 


With I | Without I Totals 


With II a OY at+y 
Without IT B 6 B+6 


Totals a+B y+6 N 





(1) 























The attributes may be such as “‘fair” and “not fair” (or dark) with 
respect to complexion and “‘tall’’ and ‘“‘not tall” (or short) for stature. 
Sometimes the attributes are capable of being quantified through the 
use of a scale, as is the case with instances above, but are classified in 
dichotomy for convenience; in some cases such as the dichotomies 
“dead” and “alive” or “male” and “female” quantification through 
adoption of a scale would be difficult if not impossible, or at any rate, 
for the problem under consideration, impractical. 

The association table may be converted from enumerations into 
“chances” of occurrence by dividing by the population N. Thus 


With I Without I 


With II T P2—-T P2 
Without II a—Tt 1—pi— pete 


Pr 1-p 





























Proceedings 


where r= vis the “‘chance”’ of both, 


and p= aris the “‘chance”’ of I, 


and pe= is the “‘chance”’ of II. 


The attributes will be called independent or unassociated if the 
“chance”’ of both (7) equals the product p:p- of the ‘‘chances”’ of each. 
This but takes over the definition of independence from the theory of 
frequency or probability. The attributes will be called positively 
associated if the frequency of both exceeds the product of the frequen- 
cies of each (1 > pipe), negatively associated (or dissociated) if it is less 
(x<p.ip2), and in general associated (without specification of sign) in 
all cases where x and pps differ. 

The condition of association may be expressed in the original num- 


bers as 
oat or <2 or adzBy. (3) 
These conditions are symmetric in that 8 and y may be interchanged 
and they therefore, like the original definitions, show that association 
is a mutual relationship of the two attributes, neither having any 
precedence over the other. In careless speech one often speaks prefer- 
entially or asymmetrically of attribute I as being associated with II or 
of II as being associated with I. There is no harm done if one realizes 
that he means nothing asymmetrical by the form of statement. The 
temptation, however, is to interpret the statement as denoting a real 
asymmetry whereby the presence of one character shows a predisposition 
to the other as though the presence of the latter were ‘‘caused” by 
that of the former. When two characteristics are in causative rela- 
tionship there is inevitable asymmetry as between the causing and the 
caused, but this asymmetry is an imputation of scientific theory or of 
personal opinion imposed upon the interpretation of the facts rather 
than inhering in the observed facts themselves. Theory is very im- 
portant in indicating what facts should be looked for as significant; 
facts are significant or important largely as they indicate theory, but 
neither compels the other, as the histories of theorizing and of fact- 
finding amply demonstrate. 
In many a problem there arises a four-fold table that is logically 
asymmetric. For example, one may examine the distribution of some 
attribute in one experimental population for comparison with its 





252 American Statistical Association 


“control” of normals, as when one compares the percentage of cures 
among patients given a new medical or social treatment with the per- 
centage of cure among those given another treatment or none at all. 
The tabulation would be: 


A B Totals 
Cured a Y at+y 
Not cured B 6 B+6 
Totals a+, y+6 N 


























where A represents the new treatment and B the old (or none). The 
totals a+ 8 and ~+6 correspond to the real numbers involved and the 
ratios a:8 and 7:5 have meanings; but there is no real sampling of any 
double universe for the whole table, rather is there the sample of 2+8 
from the ‘‘experimental” universe A and the sample y+é6 from the 
“control” universe B, and often a+ are all the cases treated—i.e., the 
whole universe A actually existent, whereas y+6 is a presumably fair 
sample of the normal (or other) controls. The totals a+~y and 8+é 
are therefore not particularly real. Still the problem may be treated 
as an association. 

It is essential to observe that for the discussion of the association of 
two attributes a four-fold classification is necessary. It is not neces- 
sary that the numbers a, 8, 7, 5 entered in the table be directly ob- 
served; the four numbers a, a+, a+, N representing respectively 
the objects with both attributes and with each and the total popula- 
tion will serve equally well, but four independent numbers there must 
be from which a, 8, y, 6 may be determined if desired. It is further 
essential to observe that the association of two attributes is a property 
of the population N tabulated; if the numbers are large enough so that 
statistical fluctuations could not well alter the (positive, negative or 
nil) association found, then the association may be affirmed of any 
universe from which the sample N may be considered as a random 
drawing; there is no such thing as association of attributes per se in- 
dependent of some universe of discourse, specified or not. 

It is rare that a population is envisaged in which two characters 
alone are significant. In general several relatively observable char- 
acters, and possibly many hidden ones, are involved. In the effort to 
disentangle the multiplicity, recourse must often be had to partial 
association, i.e., to association in subgroups of the whole. If there be 
three characters there will be eight independent divisions of the popu- 
lation necessary for the discussion of the association of each pair of 
characters in the groups where the third is present or is absent. In 





Proceedings 253 


general, if there are n characters under discussion, there will be 2" in- 
dependent groupings possible and necessary exhaustively to classify the 
individuals by presence or absence of each of the n characters. If the 
ease of three characters be examined from the three four-fold tables 
like (1), made for each of the three pairs of characters (I, II), (II, IIT), 
(I, III), twelve groupings will in all be made, but from them no ex- 
haustive classification is possible because they do not yield 2*=8 inde- 
pendent numbers; there is, for instance, no way to determine how many 
of the population have all three characters; the logical analysis of the 
situation relative to the characters is impossible. Many of the tabula- 
tions printed in the literature are of this inadequate sort, although 
adequate tabulation would have been as easy or easier. 

Possibility of ‘‘ Measuring” Association.—As it is clear that associa- 
tion is a matter of degree varying from negative to positive, one is 
tempted to assign a number to that degree. Several suggestions have 
been made. Yule has defined 


Q= ad — By and oa VV BY (4) 
ab+py Vabt+v By 


as, respectively, the ‘‘ coefficient of association”’ and the “‘ coefficient of 


colligation’’; each vanishes for no association (a5 =), each lies always 
between —1 and +1 as does the usual correlation coefficient. Subse- 
quently, however, he came to the conclusion that neither of these nor 
either of two others that had been suggested were really safe indexes of 
the amount of association. Indeed for one of these others, namely, 


ab — By 
p= (5) 
V (a+8)(a+v7)(6+8)(6+7) 


when compared with w there may arise inconsistent results, as may be 
seen by computing w and p for the two tables 
(1) 10| 6 w= .250 


= where 
6/10 p= .250 








(2) 10/| 1 w= .333 
— where 
25/10 p= .195 





If w be taken as a measure of the degree of association, table (2) shows 
@ considerably higher degree of association than table (1); whereas if p 
be taken as the measure, table (2) shows a considerably lower degree of 
association than does (1). 















254 American Statistical Association 


The best present inference seems to be that there is no satisfactory 
general measure of the degree of association, that any measure which 
may be adopted for any particular series of cases must be justified as 
suited to give sound estimates of the comparative degrees of association 
in that series. For most statistical problems the important thing is 
rather the statistical significance of the fact of association than any 
numerical measure of its amount. This statistical significance must be 
judged by some form of probable error (or its equivalent) suited to the 
problem, having regard to the nature of the sampling process by which 
the association table is imagined to be drawn from the universe. As 
few tables actually arise from a random drawing of the simple lottery 
type, it is desirable to form a number of similar tables so that one may 
judge of the actual amount of variation from table to table rather than 
depend upon a theoretical formula for the probable error. 

Correlation.—As ordinarily treated correlation is a branch of least 
squares. If X and Y are two variables representing numerically two 
characteristics on certain scales of a population of N individuals and 
the mean values of X and of Y for the group be X,, and Y,, respectively, 
the residual differences x = X — X,, and y= Y — Y,, may be formed, and 
the standard deviations squared, which may be called the variances 
of X and Y or of z and y, are o?, and o*, defined by 


No*?,==(X —X~,)? and No*,==(Y —Y~n)? (6) 


where the sign = indicates summation over the whole population N. 
The mean value has the property that the variance calculated from 
the residuals relative to it is less than would be obtained by using any 
other value. If the sum of the variables X+Y is found for each 
member of the group, the mean of X+Y is the sum X,,+Y~ of the 
means and the residuals of X+Y relative to Xn+Ym are the sums 
az+y of the residuals. The variance of the sum is calculable from 


No®z4y=2(a+y)? =2r?+22ry+Zy’*. (7) 


The first and last terms are by definition merely No?, and No’, re- 
spectively, but the middle term 22zy is a new statistical constant of 
the group; it is twice the sum of the products of corresponding residuals. 


It is customary to write 
rary =Nro.c,, (8) 





namely, as the product of the number N of the population by a co- 
efficient r and the standard deviations o, and oc, of the variables, and 
to call r the correlation coefficient. 

Then 


Pr y=o2t2rowyt+o', (9) 








eM SS @orwreoeo™.s @2 Fr aT Ss 


Proceedings 255 


2 ‘anne = 
es a ll J (10) 





and r ee 
Thus the correlation coefficient r enters algebraically into the expres- 
sion for the variance of a sum, and may in fact be determined from the 
variance of that sum taken with the variances of the two variables. 

If the variables X and Y be taken as abscissas and ordinates in a 
diagram, the two values for any member of the group of N give a point 
(X, Y) on the diagram and the whole population is represented with 
respect to its two characters under consideration by N points in the 
diagram. The group of N points is called a scatter or correlation 
diagram. One may fit by least squares a straight line, Y=mX-+5, to 
these points in such manner as to make the sum of the squares of the 
differences Y—(mX-+5) least. The conditions obtained by the method 
lead to the equation 


y=2Vy4y = x4 y, (11) 
>2* CO: 


showing again the involvement in the least squares solution of the 
product 2ay or of the correlation coefficient r. Such an equation is 
called a regression equation, and in particular the regression equation 
of Y on X; there is a correlative regression equation of X on Y, namely, 


a Sy 4y.. (12) 


Oy 


These two equations represent different straight lines on the diagram 
and mean different things; the first estimates Y when X is known and 
the second estimates X when Y is known. Thus, if for a group of 
persons the regression of weight on height is known, the equation gives 
an estimated (correlative) weight W for that height H. However, if 
it were the weight W that were known, the correlative height H’ would 
be different and would in fact be nearer the mean height H,, than H. 

Correlation is a mutual affair of two numerical variables; the corre- 
lation coefficient r is symmetrical with respect to them. Strictly, 
Y is not correlated with X or X with Y, but X and Y are correlated— 
comparably to the remarks made under association, and similar re- 
marks concerning causality may be considered as here repeated rela- 
tive to correlation. Further, the value of the correlation coefficient r 
depends on the group for which it is determined or on the universe of 
which that group is a fair sample. The correlation coefficient r of 
height and weight for a group containing humans from infancy to adult 
life would be different from, and in fact greater than, the coefficient 





256 American Statistical Association 


for college students or for the members of a football squad; there is no 
such thing as the correlation coefficient of height and weight per se. 
Furthermore, as the correlation coefficient depends on the actual num- 
erical values involved, it will depend not merely upon the individuals 
from whom the numerical values are obtained by measurement or by 
estimate but also upon the method of measurement or estimate, and, 
in general, inaccuracy of determination will reduce the value of the 
coefficient. 

The reliability coefficient for a series of measurements or estimates is 
the coefficient of correlation of two sets of determinations of that 
series. If two persons with similar devices weigh N individuals, the 
correlation of the two sets of weights will be very high, the reliability 
of weighing is usually high; if two persons grade the written or oral 
examinations of N individuals in a subject, the correlation of the two 
sets of grades may be low. If an attempt is made to interpret the 
correlation between marks in two subjects for a group of individuals, 
allowance should be made for the attenuating effect of the lack of relia- 
bility of the grading in the two subjects. It is necessary to have two 
series in each subject, X and X’ for one, say political science, and Y 
and Y’ for the other, as economics. The coefficients ryy and ryy: are 
the reliability coefficients of the grading in the two subjects and depend 
on the examiners and examinees, being likely to be higher if the class 
is highly variable in accomplishment than if it is selected so as to be all 
“honor” or all “‘flunker” students. There will be four possible cor- 
relation coefficients between the two subjects, namely, ryy, ry’, rx’y; 
rx-y, of which the mean, arithmetic or geometric or other, might be 
taken as the best estimate of the correlation r. This best estimate of r 
divided by the geometric mean of the relativity coefficients of the two 
variables 

r=r+Vryyryy 


is called the correlation coefficient corrected for attenuation. 

When more than two variables are considered, one has recourse 
to partial correlation which is computed by a formula. Thus for 4 
group of persons one may record weight W, height H and age A. 
There will be three coefficients rwy, Twa, xa. The partial correlation 
of weight and height for the group, allowances being made for age, is 


TwH—TWATHA 
V1—rwaV1—raa 





T'wH.A= 





and similarly for rw44 and ry4w. If four variables are involved the 
formulas become more complicated. 





Proceedings 257 


As the matter of correlation is a branch of least squares, there are 
reservations which may be held in respect to its application to cases 
in which fitting linear equations to the data might seem inappropriate, 
whether a priori or a posteriori. Thus arises a necessity for the con- 
sideration of various other estimates of relatedness. 

As the correlation coefficient is a proper fracticn some speak of r= .60 
as a correlation of 60 per cent. One should bear in mind that it is the 
variances, i.e., the means of the squares of the deviations, which are 
combined by addition, and not the correlation coefficients. The 
outstanding variance of »=Y—(mX-+5), the residuals from the re- 
gression equation, is o*,=o*y(1—r*). Hence if r=.60 the variance 
uncontrolled by the knowledge of X is 64 per cent of the original vari- 
ance of Y. It would be better to say that a correlation r=.60 repre- 
sents 36 per cent control and 64 per cent lack of control, but still 
better to avoid any such form of speech. 

The theory of errors is related sometimes to the so-called normal 
law of distribution; so the theory of correlation may be related to the 
normal frequency distribution in two variables, but correlation has 
been widely used in biology, psychology, anthropology, education, 
economics, sociology and the public health without much attention to 
normality of distribution. 

When the variables involved in a problem are tabulated numerically 
and when it is appropriate and desirable to express the relations be- 
tween one of them and the others in a linear equation, the methods of 
correlation, including partial correlation, are indispensable. When 
the variables are attributes, present or absent, the analysis must pro- 
ceed by the method of association. In many cases where some of the 
variables are quantitatively expressed, it is useful to classify each of 
those into a greater and a lesser category and proceed with an analysis 
by association to explain the general indications respecting dependency 
between variables before determining what correlations should be 
computed. 

References: For association and correlation, G. H. Yule, An Intro- 
duction to the Theory of Statistics, Griffin, London. For reliability and 
attenuation, T. L. Kelley, Statistical Method, Macmillan, New York; 
R. A. Fisher, Statistical Methods for Research Workers, Oliver and Boyd, 
London. 





258 American Statistical Association 


MULTIPLE CORRELATION FOR PREDICTION PURPOSES! 


By DrinsMorE ALTER 


In this paper the paradox will be demonstrated that the best com- 
bination of periodicities to represent the data from which they have 
been derived is not, in general, the combination which will predict best. 
A modified form of multiple correlation theory is developed to give the 
predictions of greatest probability. 

Given observed data 2, 21, %2, . .. %n-1; any one of which is 
designated by z;. Let z:- be a computed estimate of x; from the various 
periodicities pi, po, . . . 1, assumed to satisfy the linear equation 


Lic=APutaspietaspist . . . Api (1) 


We desire the constants a to be so chosen that the sum of the squares 
of the residuals (z;—2i-) bea minimum. This procedure gives us the 
ordinary multiple correlation theory, but it may be well to outline it 
briefly here. 

cam 9. \2 
1 [2 (2; Lic) lio. (2) 


(2; — Lic)? = (4; — Api — A2Pi2 — ee 2 da, 


With similar equations for each of the a’s let 


n n 
= Pir 2 PinPik 
on 8 
and rp, »,= 
ZPinPik = T pp Py!9 DpF vy 


Then equations (2) may be written as 


‘zp 1% = aio p, a G2? p,, pF p.t asrp,, p73 “— 
Tz, poF x = Qi" p,, P2 op, a A20p,+ As po, p37 3 “ee (4) 


V2, p37 x = QP p,, p3%Py + A2P po P37 Pe +430), + a 


with as many equations as there are coefficients a. 

The n data may be arranged in tables of p, columns, so that each 
column mean is the best prediction as given by any one periodicity px 
for the epoch which is to fall p, datum intervals after the last value 
included in the column. These means having been formed for each 


1 The writer wishes to acknowledge the fact that this study was made possible by the fellowship 
granted to him by the John Simon Guggenheim Memorial Foundation for study and research in England. 





Proceedings 259 


periodicity, the correlation coefficients and the standard deviations of 
equations (4) may be computed from the column means and the 
equations solved for the various a’s. These constants will give the best 
combination of the periodicities to represent the data from which they 
have been derived. 


Let ss ~ is 


R,,= =. (5) 
, V 22; 22*i+p, 


These R’s are the ordinates of the correlation periodogram. The 
sums in the denominator have n —2p; values in common and the non- 
common terms differ by accidental quantities, so that we may write 


Any datum 2x; may be considered as composed of two parts, of which 
one, y;,%, repeats after p, and the other is uncorrelated after this in- 
terval. 

We shall write 

z; =Yj nr tAix (7) 
@(i-sp,) = Yj, +A(i-ep,, &); 


where s is any integer. 
Since y;,, is a constant in any one column of the p, table; a column, 
or phase mean, of m rows in such a table is 


z A(:-sp, x) 
Xjn= Yet = — —$§ syn Bin (8) 


The larger the value of m, i.e., the shorter p; is with respect to n, the 
smaller is this uncorrelated term in our column means. These Xj,x 
are the p;,, of (1). 

If m be large, X;,, has a comparatively small uncorrelated part and, 
therefore, will correlate more highly with future data than if m be 
small and a larger uncorrelated part remain. Let us consider a period 
Px for which m is large, say 12 or more, and one 7; for which m is small, 
say about 2. For simplicity let us assume y; , and y;,x equally valuable 
in prediction. Then, since the A part of X;,; is large and that of X;,, 
is small, X;, is more valuable for prediction purposes than X;j,x. 
However, in correlating with the past data, the accidental part corre- 
lates, since we form the products 2;-,p, Xj, and the A part of the X 
is partly made up of the A of the z. For this reason there will be a 
spuriously high correlation which becomes greater, the smaller the 





260 American Statistical Association 


value of m. As a result, periodicities, which are so long that m is 
small, will dominate the solution, despite the fact that the shorter 
periods tend to be the more valuable from the standpoint of prediction. 
In other words the combination to represent best the past data cannot 
be the best for prediction purposes. 

If our coefficients R,, have been formed from sufficient pairs of data 
that erp, is small compared to Rp,, we may expect to predict from the 
periodicity p, with approximately the correlation obtained in our 
original data. For this purpose if we use the R’s instead of the r’s, 
equations (4) may be used to give the best prediction possible, not from 
the means X, but from the individual data preceding the predicted 
one by the various values py. We have already seen that using the 
X’s and the r.,»,’s we shall get the best representation of past data 
from (4). We wish now to determine a correlation coefficient to sub- 
stitute for these in order to get the best predictions from the X’s, 
which are obviously our best predicting material. 

Consider equations (5), (6) and (7). 


LIiLitp, = Lye +Z(y; Aitm) +2U(y;, Ai) +Z(AAi+p,). 


Since the A terms do not correlate over the period 7, the last three 
terms are zero, and 


2 
Rp, = “Hi, (9) 


o*, 


Consider now the correlations between the phase means X;; and 
future data not used in forming them. We shall refer to any one 
such future datum as z;. Since each 2; has its accidental part entirely 
independent of that of the corresponding X; x, the correlation can be 
only through y;. We shall designate correlations of these means with 
future data by r’:,p,, contrasting with the r.»,’s which represent the 
correlations of these means with the past data. 


X54 k= Yi, e+ 2y;,2Aj etd; e (10) 
X75 = Yj, e+ Qyj pA t+A*. (11) 
Since the A terms are uncorrelated, the cross terms will disappear when 
wesum. Summing the 2z’,’s, we get 
= Oy, + o*4,, (12) 
and summing the X?;;’s we get, using equation (8), 


m 
| 5 At an» 


s=1 nm 
2 Viet - |- no*y,, +— oa, 
1 m m 





Proceedings 
and by substituting (12) p 
=n o%y,4(1 = 1) 42: 
m/ mJ 


and, using equation (9), . 
= not| Rp(1 —_ 1). 1 e 
m/ mm. 
P LX ja; 
fan ° | 
V>X*2X3; 


therefore (14) is one term of the denominator of the desired 1’. 
other term is no?,. 





Now 


DXjz; = Ly"j,% => Noy, 
and by (9) 


=no"*, Ra: 
Substituting (14) and (16) in (15), 
Rp, 


an=VY Rp(1-+)+4, (17) 
™m 





1 

m 
the correlation to be expected between the phase means of the period 
p, and future data. Using these in (4), in place of each rz, p,, we solve 
for our constants a; and using the phase means X;, as the p’s, we pre- 
dict from (1). 

In the preceding work we have assumed that each value p,, as 
determined from the periodogram, has a length equal to an integral 
number of datum intervals. 

In computing a correlation periodogram it is impractical to attempt 
correlations for lags which are not such integral multiples of the datum 
interval. For this reason the length of p, as determined will be an 
integral number of these intervals, while the actual length for this 
periodicity will be p,+A, where A is, in general, less than one-half and 
is unknown. 

Suppose, as is the case in rainfall data, that the correlations for lags 
of one or two are either very small or negative. Then, since each row 
of the table which forms the X’s is displaced by A times the datum 
interval, we find that when we have gone back g rows from the last one 
there is a displacement of g A, and the row will not, unless by chance A 
is very small, contribute anything of value to its X and may actually 
decrease its value for prediction purposes. If the correlation between 





262 American Statistical Association 


adjacent data is high, as in the case of a simple periodicity, this is not 
serious. 

We may meet this difficulty in two ways. The simpler is to predict 
from the one datum preceding the predicted by px. In such a case the 
ordinary form of multiple correlation theory must be used. Positive 
and negative correlations become equally valuable for prediction 
purposes. 

The longer but better way, is to compute a short auxiliary period- 
ogram between the last row of the table and each of the preceding rows 
and to adjust these rows by the lag which gives the maximum correla- 
tion. Then, after these shifts have been made, the X’s are formed as 
before. When this is done the modified correlation theory will be 
used. There are two advantages of this procedure, in addition to the 
adjustment for A: First, if we have a cycle of average length 7, 
instead of the periodicity p;, its variation is automatically cared for. 
Secondly, if we have a negative correlation we can reverse the signs 
of the last row and of alternate rows back from it and use what one 
might, for want of a better name, term a negative periodicity. 





Proceedings 263 


SUMMARY OF THE ROUND TABLE DISCUSSION OF THE 
TEACHING OF SOCIAL STATISTICS TO PROSPECTIVE 
SOCIAL WORKERS 


By Maovrice J. Karpr 


The Committee on Social Statistics of the American Statistical As- 
sociation, in conjunction with the Committee on Sociology and Social 
Work of the American Sociological Society, held a joint session in con- 
nection with the meetings of the two Societies in Cleveland, on Decem- 
ber 31, 1930, devoted to a discussion of the teaching of social statistics 
to prospective social workers. This session was in the form of a round 
table, led by Ralph G. Hurlin, Director of the Department of Statistics 
of the Russell Sage Foundation, and was presided over by the writer. 
The discussion centered around three questions: (1) the purpose, (2) 
content and (3) method, which should characterize a beginning course 
in social statistics to future social workers. 

Dr. Hurlin, who has had a great deal of experience in the last few 
years in dealing with social work statistics in connection with the ma- 
terial which he has been gathering from family welfare and other social 
agencies throughout the country, with respect to the amount of relief 
given, number of cases handled, etc., based his discussion on his experi- 
ence in gathering, treating and interpreting the data, and also on his 
experience in teaching prospective social workers in the Training School 
for Jewish Social Work. He raised the question as to what should be 
the purpose of a course in social statistics for social workers. Should 
it be looked upon as discipline? Should it aim to give the case workers 
a critical point of view? Or should it aim to give them tools for treat- 
ing quantitative data? He made it clear that the content of the course 
would depend very largely upon the aim set for it. In like manner 
would the method of teaching vary with the purpose which the course 
or courses are to serve. Thus if the aim is to provide the students 
with discipline, a great deal of laboratory work and drill would be 
necessary. If, however, only critical ability and a point of view are to 
be developed, then drill would be of less importance and emphasis would 
have to be placed on analysis of statistical studies. Similarly, if the 
course primarily aims to give intensive training in statistical procedures, 
it would have to be built up along lines different from those necessary 
if only a critical point of view is aimed at. 

In the discussion which developed the following points were empha- 
sized: First, that since it is impossible to tell what type of work the 





264 American Statistical Association 


future social worker would engage in during the first years after leaving 
school, it is necessary to make the course as broad and general as possi- 
ble. It was made clear that the needs of the case worker differed from 
those of the case-work executive, and that the needs of case workers 
in a large organization are not the same as those in a smaller organi- 
zat.on in smaller communities. Second, that a course in statistics 
should give case workers an appreciation of the significance of the fact 
items which they are asked to collect for the purpose of accumulating 
data for statistical treatment and for statistical control. Third, that 
the social worker should be given some facility for using quantitative 
data and an attitude of caution should be developed in the use of data 
too limited for adequate statistical treatment, but too numerous for 
case study and case analysis. Fourth, that a course in statistics 
should be made compulsory because otherwise it is not likely that many 
students would elect it. It was felt by all those assembled that a 
course in statistics is essential for adequate training for social work. 

No agreement was reached in the discussion as to the content of an 
introductory course. The consensus of opinion was, however, that an 
introductory course should not go much beyond the treatment of aver- 
ages, measures of variation, the problem of sampling, an introductory 
treatment of index numbers, measures of unreliability and time series. 
If to this can be added also an introductory treatment of correlation, so 
much the better. By this was meant merely acquainting the student 
with correlation, the meaning of the coefficient, its significance and lim- 
itations, rather than developing facility in using this device. It was 
felt, also, that it was desirable to give the student some exercise in 
tabular and graphic presentation of data. 

The relation of a course in statistics to a course or courses in scientific 
method was discussed. It was suggested that statistics does not take 
the place of a course or courses in scientific method, and that in addi- 
tion to statistical methods students should also be introduced to the 
various other methods of social research. 

No agreement was reached as to the amount of time to be given to 
statistics, since this would depend entirely upon the aims and content of 
the course. 

A number of teachers of statistics in different schools of social work 
and colleges were present at this session. The feeling was expressed 
that it is desirable for teachers of statistics in schools of social work to 
get together at least once each year, in connection with the meetings 
of the Association, and a resolution was passed asking the Committee on 
Social Statistics to arrange for another session in the coming year. 





Proceedings 


MINUTES OF THE ANNUAL BUSINESS MEETING 


The Annual Business Meeting of the American Statistical Association was 
called together on Tuesday, December 30, 1930, at 9.00 a.m., by President Mal- 
colm C. Rorty. It was held at the Statler Hotel, Cleveland, Ohio. 

The acting Chairman of the Committee on Fellows, Horace Secrist, reported 
that that Committee had elected as Fellows, W. Randolph Burgess and Harry C. 
Carver. 

Edwin B. Wilson, Chairman of the Committee on Honorary Members, re- 
ported that that Committee nominated as Honorary Members, Ronald A. Fisher 
of Rothamsted and Sven Dag Wicksell of Lund. These nominees were elected by 
unanimous vote of all members present. 

The report of the Advisory Committee on the Census was submitted by Walter 
F. Willcox, Chairman of that Committee. The report was accepted and filed. 

Two amendments to the Constitution were proposed and a majority of those 
present voted in favor of a motion to adopt them. These amendments are, 
therefore, to be presented at the next Annual Meeting for final action. They are 


as follows: 
The last sentence of Article VIII shall be amended to read as follows: 

The President, Vice-Presidents, Secretary, Treasurer, the Vice-Presidents for 
the preceding year, and the person or persons holding the Presidency of the 
Association during the two preceding years, shall form the Board of Directors 
for the government of the Association, three of whom shall constitute a quorum 
at any meeting regularly convened. 

Proposed to amend the first line of Article VIII of the Constitution to read: 
P The officers of the Association shall be a President, eight Vice-Presi- 
ente,... 


On motion of Walter F. Willcox, a resolution was carried instructing the 
Board of Directors to act on behalf of the American Statistical Association in 
coéperating with the International Institute of Statistics if the next meeting of 
that Institute should be held either in Canada or in the United States. 

The following resolution was adopted: 


Wuereas, the American Economic Association is considering the advisability 
of appointing a joint committee for action in regard to the securing of fuller 
governmental statistics on corporate incomes and profits, and 
_ Wuereas, other associations may desire to act concurrently to the same end, 
it Is 
RESOLVED, to refer the vote taken on this matter to the President with power 
to appoint at the proper time members of a joint committee in concert with other 
associations. 


% The Annual Business Meeting of the American Statistical Association was 
continued at 9.00 a.m., Wednesday, December 31, 1930, at the Statler Hotel, 
Cleveland. President Malcolm C. Rorty was in the chair. 

Stuart A. Rice, Chairman of the Committee on Social Statistics, made the re- 
port on the year’s work of that Committee; and the report was accepted. 

Bryce M. Stewart presented the report for the Committee on Governmental 
Labor Statistics. This report was also accepted and filed. 





266 American Statistical Association 


In the absence of the Chairman of the Committee, Bennet Mead presented the 
report of the Committee on Institutional Statistics. The report was adopted. 
Edwin B. Wilson submitted the following nominations for the various offices of 
the American Statistical Association: 
President: William F. Ogburn 
Vice-Presidents: 
1. Collection and classification of data and administration of statistical 
agencies, 
Donald R. Belcher 
. Statistical and actuarial methods and technique, and the teaching of 
statistics, 
Harry C. Carver 
. Facts and methods pertaining to political science, sociology, social 
welfare, labor problems, and vital statistics, 
William A. Mackintosh 
. Facts and methods related to anthropology, biometry, psychology, and 
education, 
Sewall Wright 
5. Facts and methods bearing upon economics and economic theory, 
Walter W. Stewart 
6. Facts and methods pertaining primarily to business, 
Fred G. Tryon 
Secretary-Treasurer: Willford I. King 
Editor: Frank A. Ross 
A motion was carried instructing the Secretary to cast a unanimous ballot for the 
election of these officers. 

The Secretary, Willford I. King, made the Secretary’s report covering the year 
1930. As Treasurer of the Association, he also submitted the report of the 
Treasurer. 

The report of the Editor for the year 1930 was submitted. 

The following resolutions were unanimously adopted: 

Be Ir Resonven, that the American Statistical Association, assembled at its 
Annual Meeting, desires to extend most hearty thanks to the Chamber of Com- 
merce of the City of Cleveland, to the Statler Hotel, and to Messrs. D. C. Elliott, 
Bradford B. Smith, and Leonard P. Ayres, for the admirable arrangements made 
for the entertainment of the members of the Association attending the Annual 
Meeting, and for the program of the Association. Without their assistance, it 
would have been impossible to have conducted the meetings with dispatch and 


convenience. 

Be Ir Resonvep, by the members of the American Statistical Association 
assembled at the Annual Meeting, that the thanks of the Association are due to 
President Malcolm C. Rorty and to the Vice-Presidents of the Association and 
others who assisted him in the preparation of the unusually interesting and en- 
lightening program of the 1930 meeting. 


Witrorp I. Kina, Secretary 





Proceedings 


REPORTS 


Report of the Secretary 


Membership, December 1, 1929 
Additions during 1930 
Deductions during 1930: 


Resignations 
Dropped because unable to locate 
Dropped for non-payment of dues 


Net gain in membership during 1930 


Membership, December 1, 1930 
Besides members we have: 
Subscribers, mostly libraries 
Domestic exchanges 
Foreign exchanges 


Total members, subscribers and exchanges 


The following deaths were reported during the year: 
Dr. John Brownlee Mr. John Hyde 
Mr. R. C. Edmondson Sir George H. Knibbs 
Mr. D. E. Felt Mr. Edward B. Morris 
Professor Arthur T. Hadley General C. M. Oberoutcheff 
Dr. J. Arthur Harris Mr. Herman Rumpen 
Dr. George K. Holmes Mrs. Jennie Lee Schram 
Mr. Burr H. Humiston Professor Henry R. Seager 

Mr. Rufus W. Weeks 


Four numbers of the JouRNAL, containing 505 pages, together with the Pro- 
CEEDINGS Supplement, containing 198 pages, have been issued in 1930. These 
compose Volume XXV. 

There were 27 pages of paid advertising in the four issues. 

Mr. H. B. Stair, of the Illinois Bell Telephone Company, has succeeded Mr. 
Robert B. King as District Secretary of the Chicago Chapter. Mr. King is now 
— with the American Telephone and Telegraph Company of New York 

ity. 





268 American Statistical Association 


We have reports of meetings held by district organizations as follows: 
Number Average 
of Meetings Attendance 
29 
120 
57 
33 
37 
17 
192 
34 


3 25 

3 87 

1 60 
Wi1Forp I. Kina, Secretary 


Report of the Treasurer, Covering the Period December 1, 1929, 
to November 30, 1930 


Gross Receipts and Disbursements 


Cash on hand December 1, 1929 $1,487.83 


Gross cash receipts during year 
$23,940.18 


Gross cash disbursements during year 23,133.96 


Cash on hand November 30, 1930 $806.22 


Gross cash disbursements during year $23,133.96 
Net cost of bonds and stocks purchased $3,267.17 
Refund on securities 


Gross cost of bonds and stocks 
Miscellaneous refunds 


Functional Apportionment of Gross Expenses 
(Including Overhead) 
Annual meeting and New York meetings 
JOURNAL and PROCEEDINGS 10,739.00 
Employment Clearing House 
Securing new members 


$19,320.40 





Proceedings 


Net Receipts and Disbursements 


Cash on hand December 1, 1929 
Gross cash receipts during year 


Purchase of bonds 
Other cash payments during year 


Net cash disbursements during year 
Cash on hand November 30, 1930 


Cash on hand December 1, 1929 
Net Cash Receipts 


Individual membership dues 
Corporate membership dues 


Advertising 
Interest on bonds and deposits 


New York meetings 
Sale of miscellaneous books to members 
Receipts from authors for reprints 


Net Cash Disbursements 


(Expenses) 
Cost of JouRNAL: 
December, 1929, edition, including reprints 
March, 1930, edition, including reprints 
PRoceEpINGs, including reprints 
June, 1930, edition, including reprints 
September, 1930, edition, including reprints 


$22,452.35 


20,901 . 86 


$22,389. 69 


$3,567.17 
19,566.79 


$23,133.96 


1,550.49 


$10,517.22 
800.00 
3,230.56 
1,111.96 
504.00 
613.34 
148.75 
3,576.40 
109.50 
290.13 


$1,297.98 
1,289.05 
1,419.01 
1,055.49 
1,281.34 


$6,341.97 





270 American Statistical Association 


New York meetings $2,611.36 

Postage, stationery, and miscellaneous printing 1,426.47 
6,699. 48 

324.00 

Telephone and telegraph charges 

Annual Meeting 

JOURNALS bought back 

Shipping JouRNALS 

Miscellaneous expenses, Secretary’s office 

Miscellaneous expenses, Editor’s office 

Miscellaneous books sold to members 


Total expenses 
Cost of bonds and stock bought 


Total disbursements $21,583.47 
Cash in checking account, Bank of America, November 
$461.97 
Checks deposited for collection in Bank of America, 
November 30, 1930 10.00 
Cash in Emigrant Industrial Savings Bank, Novem- 


806. 22 


$22,389. 69 


On November 30, 1930, the Association possessed, in addition to stationery 
and miscellaneous supplies, the following assets: 


Accounts receivable (unpaid advertising) 

JOURNALS and PRrocEEpINGs, 9215 copies. 

Memorial Volume, 52 copies. 

Bowley-Edgeworth pamphlets, 5 copies. 

Bonds—market value November 30, 1930, plus accrued interest. . . . 

Guaranteed and preferred stocks—market value November 30, 1930 3,960.00 
There were no bills outstanding. 


WitiFrorp I. Kina, Treasurer 





Proceedings 271 


Report of the Auditing Committee 


We have examined the books and records in the office of the Treasurer of the 
American Statistical Association and have verified the bank balances. 

We hereby certify that the Treasurer’s report submitted is in accordance 
with the books and records, and, in our opinion, correctly represents the financial 
condition of the American Statistical Association as of November 30, 1930, and 
receipts and disbursements during the year December 1, 1929, to November 


30, 1930. 
SumNER T. PIKE 


Rosert B. Kine 
Auditing Committee 
New York, N. Y. 
December, 1930 


Report of the Editor 


The Editor has no new developments to report during the past twelve months. 
By strenuous editing down of the papers given at the Annual Meeting in 
Washington, the ProceEepinGs volume published in March, 1930, was kept to a 
size commensurate with its budget. This was further facilitated by the publi- 
cation in Statistics in Social Studies, under the Editorship of Stuart A. Rice, 
of the papers presented at the joint sessions with the American Sociological 


Society. 


In consequence it has been possible during the year to enlarge to a limited 
extent the various issues of the JourNaL. The number of pages printed in the 
several issues was as follows: 


702 (regular issues, 504) 


In the several years in which the ProceEeprnGs have been published there have 
been printed 731 total pages in 1929, 678 in 1928, and 550 in 1927. 

Much of the material submitted for publication in the JouRNAL has been 
expensive to print. In order to cover the extra cost, donations have been 
solicited from the agencies which sponsored the studies. In this way, although 
the cost of publication has exceeded the budgetary provision, the cost to the 
Association has been kept within bounds. 

The Editor wishes once again to thank all those whose coéperation has made 
it possible for the JouRNAL to keep up the definite progress it has shown in past 
years. It is hoped that the coming year will see further strides. 

FRANK ALEXANDER Ross, Editor 

















272 American Statistical Association 





Report of the Census Advisory Committee 


At the request of the Director of the Census a meeting of the Census Advisory 
Committee was held at 9.30 a.m., on Saturday, November 8, 1930, at the Bureau 
of the Census in Washington, to consider the advisability of making a sample 
census of unemployed. Four members of the Committee, Professors George E. 
Barnett, W. I. King, Leo Wolman, and Walter F. Willcox, and one former mem- 
ber, Professor G. F. Warren, who had been especially invited by Director 
Steuart to attend, were present. 

The Director of the Census in a preliminary explanation said that he had 
been asked to get the opinion of the Committee on the question of taking a 
census of unemployment in a limited area or for certain districts as a sample, 
so that the results could be used for comparison with those of the complete 
census of unemployment taken in April, 1930, and as a basis in estimating the 
total number of unemployed in the United States at the present time. 

After a full discussion the Committee adopted unanimously the following 
resolutions: 

1. The Committee believes that at the present time the best aid that the Bu- 
reau of the Census can furnish on the subject of measuring unemployment is 
by expediting the publication of the state bulletins giving detailed results of 
the unemployment census taken last April, including classification of the unem- 
ployed by age, by sex, by number of weeks unemployed, by reason for unem- 
ployment, by family relationship, and by occupation. These bulletins will be 
of great value in measuring the nature, extent, and seriousness of unemployment. 

2. The Committee believes it desirable that a special tabulation of informa- 
tion on the schedules of the current census be made to show the number of unem- 
ployed in their relation to the families to which they belong and thus add 
materially to what is now known about unemployment. 

3. The Committee believes that a special enumeration of the unemployed 
- of population through a sampling census is not to be recommended at this 

ime. 


Watter F. Writcox, Chairman 


Report of Committee on Social Statistics 


The Social Statistics Committee of the American Statistical Association 
begs leave to submit its report covering its activities during the year 1930 as 
follows: 

The Committee’s existence and activities have been premised upon the fact, 
to which it has several times called the attention of the Association, that the 
membership of the latter is overwhelmingly composed of persons whose primary 
interest is in economic and business statistics. While we regard this interest as 
desirable, and worthy of emulation by other groups of statisticians, its effect is 
to draw the Association away from the balanced and well-rounded program of 
interests to which it is in theory devoted. The primary function of the Com- 
mittee on Social Statistics then seems to be to represent within the Association 
the interests which are more generally and more broadly “social” in character. 
A more specific but secondary function has been recognized in the Committee 
in connection with some newer aspects of social welfare activities. 












Proceedings 273 


In reviewing the work of the Committee during the year several accomplish- 
ments and activities may receive special mention. 

1. Publication of Statistics in Social Studies. Following the annual meeting 
of 1929, several plans were discussed for the publication of the series of papers 
arranged by the Committee for the program of that year. The Committee 
sought in its program to exhibit the nature of the problems that are encountered 
when the methods of statistics are applied to social and sociological studies. 
Although the subject matter was diverse, the approach seemed unified, and it 
was felt that there would be value in bringing the papers together in a single pub- 
lication. With the concurrence of the officers of the American Statistical 
Association and the American Sociological Society, and with the approval of the 
several contributors, the Committee arranged with the University of Pennsyl- 
vania Press for the publication of the collection under the title Statistics in Social 
Studies. This volume appeared in the spring of 1930 under the editorship of the 
chairman of the Committee. Assistance in advertising the volume was given 
by the secretaries of the two societies and a half price edition was made available 
to members in recognition of the fact that abstracts of the papers were not 
printed in the Procerpines. I am advised by the University of Pennsylvania 
Press that the book has had a satisfactory sale, although the contract entered 
into does not call for a first report until after the close of the year. It is hoped 
that royalties will be sufficient to meet the expenses advanced for the publica- 
tion of the book, and in addition provide a small balance for the Committee’s use. 

2. A subcommittee under the chairmanship of Dr. Neva Deardorff, Director 
of Research of the Welfare Council of New York City, has codperated actively 
with the Census Bureau in making plans for the tabulation of the 1930 Census 
data relating to families. The suggestions of the subcommittee were cordially 
welcomed by the Census Bureau and were incorporated in its tabulation schedule. 

3. A second subcommittee under the chairmanship of Professor F. Stuart 
Chapin of the University of Minnesota, has devoted its attention to the statis- 
tics of public social and health work expenditures. These were defined as fol- 
lows: “Statistics of expenditure for public social and health work are public 
expenditures for the under-privileged or for the socially inadequate. They are 
ordinarily designed to equalize opportunities.” Various activities and services 
in sixteen administrative units of the State of New Jersey which might be clas- 
sified as public social and health work have been listed. These have been 
grouped under such headings as “the poor,” “hospitals,” “public health,” 
“correction,” and “recreation.” Miss Kate Huntley prepared an analysis of 
the total public appropriations and expenditures of the city of New York for 
1926, and indicated those which have been included in the Welfare Council’s 
Study of Income and Expenditures of Social Agencies in that city. It was found 
that at least fifteen administrative units and sub-divisions of the government of 
the city of New York made expenditures for public social and health work. 

The Committee proposes to list: (1) Published and accessible statistical 
sources about the accuracy of the classification of which there would be little 
question; such for example, as the expenditures for poor relief. (2) Published 





274 American Statistical Association 


reports of expenditures of doubtful classification, such as expenditures for penal 
institutions. (3) Social and health activities for which there are no published or ac- 
cessible data; in this field it would be necessary to seek to establish standards of 
reporting and publication. An example would be the expenditures for super- 
vised recreation within a large park system. 

The subcommittee desires to secure $2,500 for statistical clerical services in 
completing its analysis. It believes that its work will aid in the improvement 
of financial statistics of cities as now published by the Census Bureau, which are 
at present far from satisfactory in the field of public social and health work; and 
also in the better classification of these expenditures by all governmental units. 

4. The Committee believed that a continuance of its participation in the 
preparation of the annual program of the Association would be desirable. It, 
therefore, sketched plans for several sessions at the current annual meeting. The 
general method of arranging the program this year, however, divided this re- 
sponsibility among the President and the several Vice-Presidents of the Statis- 
tical Association. Because of the Committee’s plans, and in view of the ab- 
sence of Vice-President Coats, who was in attendance at the London Imperial 
Conference, the Committee was asked by President Rorty to assume the respon- 
sibility of arranging three sessions at the present annual meeting. It willingly 
accepted this responsibility, although it involved the arrangement of one ses- 
sion which clearly fell more properly within the scope of another Committee of 
the Association. 

Two new members of the Committee were added to its personnel during the 
year, namely, Professor L. C. Marshall of the Institute of Law, Johns Hopkins 
University, and Dr. Meredith Givens of the Social Science Research Council. 

The problems confronting the Committee during the coming year are of two 
general sorts: (1) It faces various questions of relationship, both with similar 
Committees of other organizations, and with other Committees of the American 
Statistical Association. Thus, for example, its relationships with the American 
Sociological Society are obviously close, and within the latter there is a statistical 
division. There is, moreover, a Social Statistics Committee of the Social 
Science Research Council. It has been suggested that the work of these three 
separate organizations might be more effectively integrated. Again, certain 
of the Committee’s activities have been closely related to those of the Commit- 
tee on Institutional Statistics of the Statistical Association. Here again, it 
is possible that some more effective demarcation of responsibilities and of 
organization may be devised. These are questions to be determined by the 
officers and executive committee of the Association. 

(2) The Committee believes that it is rendering a service needed within the 
Association. As illustrating the needs we may cite the fact that the Committee 
has been approached for assistance in formulating standards of education and 
supervised experience for persons applying for positions in the field of social 
statistical research. Other activities are in various stages of discussion. We 
therefore recommend the Committee’s continuance. 


Respectfully submitted, 
Sruart A. Rice, Chairman 





Proceedings 275 


Report of the Committee on Governmental Labor Statistics 


During the year 1930 the Committee was chiefly concerned with a study of 
public employment office statistics in Europe and America, the initiation of a 
survey of wage statistics in the United States, and the preparation of a broader 
program of employment statistics for the State of New York. 

(1) Study of Statistical Procedure of Public Employment Offices—The Interna- 
tional Association of Public Employment Services, at its annual meeting in Sep- 
tember, 1929, passed a resolution endorsing the recommendation of its committee 
on uniform forms, records, and procedure that the Committee on Governmental 
Labor Statistics should be invited to investigate the subject of standard statisti- 
cal procedure for public employment offices and make recommendations to the 
International Association. 

The Committee was pleased to act on this request. It happened that the 
executive secretary of the Committee was leaving shortly to spend some 
months in Europe and it seemed that any satisfactory study would involve 
examination of the methods worked out by the principal employment exchange 
systems in Europe because of their long experience in this field. Accordingly, 
the executive secretary was commissioned to proceed on these lines during his 
stay in Europe. 

Plans for the study were discussed with officials of the International Labor 
Office and their advice followed in the selection of the countries to be studied. 
A plan of collaboration was suggested by which the reports made by the Com- 
mittee on Governmental Labor Statistics on the various countries visited would 
be available for the International Labor Office and form at least the basis of the 
statistical section of a study of employment offices the Office had planned to 
undertake. This suggestion was submitted by mail to members of the Com- 
mittee and received their unanimous endorsement. 

Material available in the library and files of the International Labor Office was 
consulted before visits were made to the countries chosen for study. 

Investigation was begun in Switzerland, in view of the fact that the executive 
secretary was stationed at Geneva and that the questionnaire might be more 
easily tried out there. The Swiss experience proved decidedly significant for 
America in that the history of the public employment service has been much the 
same as in the United States, having developed under local and cantonal auspices 
with a continuing growth of codperation. 

The investigation was carried on next in Germany, where, partly because of the 
larger and more diverse character of the country, uniform procedure has devel- 
oped more slowly than in Switzerland. A few years after the War a national em- 
ployment exchange act was passed but it was not until 1927, when the national 
system of unemployment insurance was adopted, that real effort to arrive at 
standard employment office procedure was begun. As the situation is today, 
there is no available manual of procedure. Certain definitions and methods 
have been agreed upon but the work is still not completely organized. 

The work of the municipal exchanges in Paris was next covered as the best 
in France. The employment offices in France receive subventions from the na- 





276 American Statistical Association 


tional government and the secretary of the commission charged with the alloca- 
tion of these subventions supplied copies of the necessary reports and instructions 
to the offices. 

In Great Britain the fullest facilities were accorded for a study of the employ- 
ment exchange system. In recent years the exchanges have been largely con- 
cerned with the administration of the unemployment insurance acts, but the sta- 
tistical practice is the result of two decades of experience and should prove of the 
greatest value to the United States. 

The system in Sweden was studied next. The offices in Sweden operate under 
a uniform system and receive subventions from the state, under certain condi- 
tions, for a part of their expenses. A memorandum concerning public employ- 
ment offices, issued by the Royal Social Board, one bureau of which has respon- 
sibility for the public employment offices, outlines the procedure for the offices 
and gives instructions for the daily and monthly reports required of all offices 
which receive subventions from the state. 

Time did not permit of a study of the national system of Italy but information 
is being sought from the chief of the employment service and it is hoped to include 
a section on Italy in the final report. 

A section on Canada is also being prepared and a questionnaire has been sent 
to the employment services of the various states of the United States. It is ex- 
pected that the report will be submitted to the International Association of Pub- 
lic Employment Services before its annual meeting in 1931. 

(2) Survey of Wage Statistics —At the annual meeting in 1929 a subcommittee 
on wages was appointed under the chairmanship of Roswell F. Phelps. The 
other members of the subcommittee were E. B. Patton, secretary, F. E. Crox- 
ton, R. H. Coats, and D. D. Lescohier. 

For some time the Committee has had in its program a plan for setting up a 
more adequate scheme of current wage statistics than now exists in any state or in 
the Federal Government, and the subcommittee planned te undertake a survey 
of wage statistics in the United States with a general view to securing improve- 
ment and uniformity in the collection of these data. 

A preliminary digest of the material in the published state reports on wage 
statistics has been prepared. Each state has been asked for copies of the forms 
and schedules used by them in collecting wage information; the sources of the in- 
formation secured; facilities for the collection and tabulation of such wage re- 
ports; and, finally, for suggestions as to the kind of wage statistics that would be 
most helpful. This information was also requested for the Dominion of Canada. 

(3) Presentation of the Committee’s Program at the Conference of Governors.— 
When it was announced that the executive committee of the Conference of Gov- 
ernors on March 24 had decided upon unemployment as one of the four topics for 
discussion at the annual meeting of the Conference of Governors of States in Salt 
Lake City, June 30 to July 2, it was decided to place before the governors the 
Committee’s program for better statistics on employment and unemployment. 

A small group of New York members (Berridge, Givens, Hurlin, Patton, and 
Miss van Kleeck) decided that the Committee’s present program should be 


















Proceedings 277 


newly formulated to include current recommendations as acted upon from time to 
time in Committee meetings but not yet fully adopted in governmental bureaus. 

The reformulated program was sent on May 16 to all the members for comment 
and suggestions regarding the best procedure to promote action by the Confer- 
ence of Governors. The tentative program was circulated informally at the 
Louisville meeting of the Association of Governmental Officials in Industry held 
in May. 

A fifteen-page memorandum, “A Program for Statistics of Employment and 
Unemployment in the United States: Recommendations of the Committee on 
Governmental Labor Statistics of the American Statistical Association,’ was 
forwarded with a covering letter on June 23 to the governor of each state, a week 
before the conference convened. 

Governor Franklin D. Roosevelt of New York gave special emphasis in his 
address to the need for better statistics of employment. A letter from Governor 
Cooper of Ohio, referring to the communication of the Committee on Govern- 
mental Labor Statistics, was included in the minutes of the conference, as fol- 
lows: 

This state stands ready to join with other states in an endeavor to put into 
effect the general plans recommended by the Committee on Governmental Labor 
Statistics of the American Statistical Association. The suggestion is made, 
however, that perhaps more rapid progress could be made and that uniformity 
could be more fall assured if the Federal Bureau of Labor Statistics were given 
greater responsibility than that of a codrdinating center. The specific sugges- 
tions in this respect are that each State Bureau of Lae serve as a representative 
of the Federal Bureau in the collection of current statistics of employment upon 
uniform blanks supplied by the Federal Bureau and in the compilation of such 
data according to a uniform plan. 

Acknowledgments, many of which expressed interest, were received from 
twenty-two states, either personally from the governor or from the executive 
office. 

(4) Preparation of a Broader Program of Employment Statistics for New York 
State-—A letter of July 8 from Frances Perkins, Industrial Commissioner of New 
York State, asked the Committee to recommend a broader plan of employment 
statistics for the state. Acting on this request, the chairman appointed a sub- 
committee of members located in New York—Berridge, Givens, Hurlin, and 
Patton. The executive secretary also served on the subcommittee after his re- 
turn from Europe in August. W. A. Berridge acted as chairman and M. B. 
Givens as secretary. 

This subcommittee prepared a plan for the development of the employment 
statistics of the state, which was submitted to the Industrial Commissioner on 
September 10. The report recommended that to the monthly index of factory 
employment and payrolls there should be added an index of building construc- 
tion in view of its importance in the state and the successful experience of other 
states and Canada in compiling such an index. It was suggested that data 
should be secured on employment, amount of payroll, and, if possible, employee- 
hours; the entire construction industry in its broader sense should be covered, 
including (if a suitable reporting system could be developed) public and quasi- 
public, as well as private, construction. 





278 American Statistical Association 


Mention was made of the advisability of canvassing at least once a year em- 
ployers engaged in certain other specified lines to secure employment and payroll 
totals from all who employ more than a specified number. Unemployment sur- 
veys in the form of “sampling” surveys in key centers should be encouraged to 
the utmost. 

It was also recommended that the present indexes of factory employment and 
payrolls should be tested against the biennial censuses of manufactures; that the 
size of the factory employment sample should be somewhat enlarged; and that 
consideration should be given to improvement of the public employment office 
statistics. 

The Commissioner accepted the report in principle and it was agreed that, in 
view of the needs and budget limitations, the work should center for the time 
being on statistics of building construction. 

(5) Endorsement of Program for Employment Statistics by the Association of 
Governmental Officials in Industry —At the convention of the Association of Gov- 
ernmental Officials in Industry of the United States and Canada, held at Louis- 
ville in May, 1930, the following resolution was adopted: 

Resolved: (1) That the Association of Governmental Officials in Industry of the 
United States and Canada endorse the principle of collection and publication of 
employment and payroll figures by industries. (2) That in order to secure com- 
parability of figures, such collection and publication be on the same general lines 
as those now followed by the United States Bureau of Labor Statistics and by a 
number of the leading industrial states. (3) That, in order to avoid duplication, 
all states collecting such reports coéperate with the United States Bureau of La- 
bor Statistics on the same basis as now in effect between that Bureau and the 
nine states coéperating with the Federal Bureau. 


This is essentially the plan being advocated by the Committee on Govern- 
mental Labor Statistics, and members of the Committee were active in support 
of this action by the Association. 

At this conference there was circulated informally a tentative draft of the rec- 
ommendations to be made in June to the Conference of Governors in order that 
the members might be informed of the Committee’s action as it affected the work 
in their own states and that they might have opportunity to suggest changes in 
the statement. Their endorsement of the principles of the plan was virtually an 
acceptance of it and a number of these officials have coéperated with the Com- 
mittee in an effort to interest their Governors. 

(6) Effort to Promote an Index of Employment for the New York City Government. 
—Members will recall the Committee’s experimental record of direct employ- 
ment and payrolls of the New York City Government for the months of June and 
July, 1929. This effort was made in codperation with the Welfare Council of 
New York City. 

Following this, it was agreed in conference with the Welfare Council that the 
Committee would set up a detailed procedure for the compilation of the monthly 
employment figures and that the Welfare Council, on its part, would undertake to 
approach the city authorities and persuade them to begin the regular compilation 
of the employment index and to install the procedure agreed upon. Early this 
year a memorandum was prepared which outlined the “procedure for compiling a 
monthly index of employment afforded by the New York City Government.” 





Proceedings 279 


The Welfare Council has had negotiations with city officials. The Depart- 
ment of Audits was willing to consider the matter of keeping the record of em- 
ployees on the city payroll, in accordance with the plan suggested by the Com- 
mittee, after they had moved to new quarters in the Municipal Building. On 
the matter of contract employment, the results have not been encouraging. The 
Welfare Council does not wish to let the matter drop and has asked for an esti- 
mate on probable costs. If dependable information could be secured regularly, 
the State Department of Labor would arrange for its publication. Effort in this 
direction is being continued. 

(7) United States Census of Unemployment.—The members of the Committee 
were informed from time to time during the first few months of the year of the 
efforts made to promote the adoption of its recommendations for the census of 
unemployment. While a number of the Committee’s recommendations were 
adopted, two of the most important points were not approved: (a) that the time- 
unit should be a week rather than a day, and (b) that part-time unemployment 
should be included in the inquiry. The Committee’s recommendations were 
fully outlined in its annual report for last year, published in the ProceEpINGs of 
the American Statistical Association, March, 1930. 

In preparation for the joint meeting of the American Statistical Association 
and the American Association for Labor Legislation on December 30, the chair- 
man spent a week in Washington studying the procedure and preliminary pub- 
lished reports of the unemployment census. When the complete results are 
available, the Committee hopes to respond to the request of the Director of the 
Census to formulate recommendations for the possible future conduct of such a 
census. 

(8) Coéperation with International Industrial Relations Association.—In Octo- 
ber the International Industrial Relations Association asked the Committee to 
prepare a report on fluctuations of employment in the United States and Canada 
from 1910 to 1930, in advance of the congress to be held by the Association at 
Amsterdam next August. Dr. F. C. Benham of the London School of Eco- 
nomics is conducting a similar inquiry in Great Britain and France, and Dr. Rob- 
ert Wilbrandt of Technische Hochschule, Dresden, is analyzing the data for 
Germany. Dr. W. A. Berridge of this Committee has undertaken the principal 
direction of this work for the United States and Canada, and it is hoped that a 
study may be made in China and possibly in other countries. 

(9) Program for Joint Meeting of American Statistical Association and American 
Association for Labor Legislation —The Committee was asked by the President of 
the American Statistical Association to organize a joint session with the American 
Association for Labor Legislation on the subject “‘ Measurements of Employment 
and Unemployment”’ for the Cleveland meeting. 

(10) Developments in Labor Statistics —The outstanding development in labor 
statistics during the year was the census of unemployment conducted by the 
Bureau of the Census in conjunction with the decennial census of population. 
It is considered elsewhere in this report. Despite some disappointment concern- 
ing the character of the inquiry and fairly widespread criticism of the findings 
thus far published, this unemployment census is certainly the most ambitious 
official inquiry of the kind in the nation’s history. The results of the first census 





280 American Statistical Association 


of distribution, particularly the employment data for wholesale and retail es- 
tablishments and the building industry, will also be welcomed by statisticians 
generally. 

The bill to extend the employment statistics of the Bureau of Labor Statistics 
to non-manufacturing industries, which was prepared by the Committee at the 
request of Senator Wagner of New York State, became law. Unfortunately, the 
necessary funds to carry on this new work were not provided and the Bureau has 
been able to do comparatively little in extending its activities in the directions re- 
quired by this legislation. A beginning has been made in employment statistics 
of building construction. The Bureau was further handicapped by a require- 
ment that its index of employment and payrolls in manufacturing industries 
should be placed on a weekly basis during the first few months of the depression. 
The expenditure of time and resources on this work necessarily restricted the Bu- 
reau’s effort to undertake the extension of its employment data required by the 
new law. 

The Bureau of Mines has undertaken a very considerable improvement in 
mine-accident statistics. Up to the present the information on fatalities in coal 
mines has been obtained indirectly through the state mine inspectors and there 
has been no information whatever on non-fatal accidents outside of a few in- 
dividual states. Beginning January 1, 1931, the Bureau will collect information 
on both fatal and non-fatal accidents direct from the mine operators. The in- 
formation thus obtained will be used in conjunction with the reports on volume 
of employment, production, physical conditions in the mines, and machinery em- 
ployed, and will permit for the first time of a study of accident hazard as related 
to natural conditions and mining methods. Funds have been obtained to place 
this work on a satisfactory basis and a special division for it has been created in 
the Health and Safety Branch. The development would appear to be a signal 
improvement in the country’s system of labor statistics. 

(11) Program of the Committee for 1931.—The Committee plans to concentrate 
on the completion of its study of public employment office statistics during the 
first half of the year. The next task will be to push forward the survey of wage 
statistics upon which a good deal of preliminary work has been done. The sub- 
committee in charge of the survey plans to meet in New York early in the year to 
consider the further prosecution of its work, and it is probable that the entire 
Committee will convene immediately afterward to receive the subcommittee’s 
report and to join in the planning of the survey. 

The membership of the Committee during the year has been as follows: 


Cuar.es E. BALDWIN Paut H. Douetias Howarp B. Myers 

JosepH A. BECKER O. A. Frrep EvaeEne B. Patron 

Wiuuram A. BERRIDGE MerepitTH B. Givens Roswe tt F. PHELPS 

Louis BLocu Leonarp W. Hatcu Casimir A. SIENKIEWICZ 

R. D. Cann Raupx G. Hurin F. G. Tryon 

R. H. Coats Don D. LEsconIER H. H. Warp 

J. Frepertc Dewnurst Leirun Maanusson Mary vAN Kiexeck, Chairman 
Bryce M. Stewart, Executive Secretary 





Proceedings 281 
Report of Commitiee on Institutional Statistics 


In its last annual report your Committee referred to its efforts to secure funds 
for a study of the welfare laws, records, and statistics of the several states. It was 
hoped that such study would lead to the preparation of a model state law which 
would provide for the preparation of welfare statistics in accordance with a 
standardized scheme which would be uniform throughout the country. 

The application of your Committee to the Social Science Research Council for 
a grant of funds for the pursuit of this work is still pending. 

Your Committee held an all-day meeting in New York on November 14, 1930, 
and reviewed the progress of the year in the field of welfare and institutional statis- 
tics, and considered ways and means of promoting further advance. In the 
absence of funds for research and constructive work it was felt that the action of 
the Committee must be limited to the giving of assistance to committees of or- 
ganizations seeking to improve records and statistics in their respective fields. 

During the past year members of your Committee served on the Committee 
on Statistics of the International Congress on Mental Hygiene, which met in 
Washington in May, 1930. That Committee outlined a plan for international 
statistics of the insane, feebleminded, and epileptics. Members of your Com- 
mittee also actively participated in the work of the Committee on Statistics of 
the White House Conference on Child Health and Protection. That Com- 
mittee is working on the difficult task of outlining a system of records and reports 
for institutions for children. 

The past year has witnessed a number of significant steps toward the improve- 
ment of statistics of crime and its treatment. As the matter is of great public 
interest it is reported in some detail. 

Police Statistics —An important new development in police statistics was the 
collection and publication for the first time of comprehensive monthly statistics 
concerning major offenses known to official police agencies. This work was be- 
gun in January, 1930, by the Committee on Uniform Crime Records of the In- 
ternational Association of Chiefs of Police. This was the same committee which 
had previously studied the problem of collecting adequate crime statistics, and 
had formulated a complete system for statistical recording and reporting by local 
police agencies, and for the assembling of such statistics on a nation-wide 
basis. 

On September 1, 1930, this work was taken over by the Federal Department of 
Justice as a function of the National Division of Identification and Information 
in the Bureau of Investigation and has thus acquired official status. Although 
reporting is still entirely voluntary on the part of the local police agencies, the 
area covered by the data has steadily expanded. For October, reports were re- 
ceived from 879 cities and towns located in all of the states, and including 302 
out of the total of 368 cities with a population over 25,000. These cities com- 
prised a population of about 37,500,000, or 76 per cent of the total population liv- 
ing in cities of more than 25,000. A promising start has been made in the collec- 
tion of crime data from sheriffs and other police officials operating in the rural 
areas. It is planned, also, to compile annual statistics, beginning with the year 





282 American Statistical Association 


1930, concerning major offenses which are cleared by arrest, and concerning 
persons arrested for major offenses. 

Court Statistics.—A number of state judicial councils have undertaken during 
recent years to develop more adequate court statistics. An especially significant 
development of the past year was the collection of detailed data from the Ohio 
courts, under joint auspices of the Ohio Judicial Council, the Ohio Institute, and 
the Johns Hopkins University Institute of Law. This intensive piece of research 
should help to establish standards for the continuous future compilation of ade- 
quate statistical court data. 

Similar intensive studies are being undertaken in Maryland and other states 
by the Johns Hopkins Institute of Law, in codéperation with various official 
agencies. The Harvard Institute of Criminal Law, the Yale Law School, and 
the Columbia University Law School, and other University groups are also con- 
ducting significant studies in the field of judicial statistics. 

Penal Statistics.—The Federal Bureau of Prisons, in the Department of Justice, 
has begun during the past year the compilation of detailed statistics concerning 
Federal prisoners confined in non-Federal institutions. These statistics cover 
prisoners held for trial, as well as those under sentence. It is planned also to 
develop more complete and detailed statistics of persons placed on probation by 
the Federal Courts. 

The annual census of institutions taken by the Federal Census Bureau is enter- 
ing the fifth collection year. A bill giving the Bureau full authorization for this 
work has passed the United States Senate and will probably become a law during 
the present session. We feel that the Bureau is entitled to great credit for its 
continued codperation in the improvement of institutional statistics. 

Respectfully submitted, 
FREDERICK BANE Emit FRANKEL 
FREDERICK W. BROWN BENJAMIN MALZBERG 
Kate H. CLAGHORN BENNET MEAD 
Net A. Dayton Cart E. McComss, Secretary 
Neva R. DEARDORFF Horatio M. Pouock, President 


Report of Committee on the Encyclopaedia of the Social Sciences 


The first three volumes of the Encyclopaedia have been issued during the past 
year. With the appearance of the first volume, in January, 1930, a dinner held 
to celebrate the event was attended by 250 educators, business men, lawyers, and 
statesmen. Reviews in scientific and daily journals, both in this country and 
abroad, have given evidence of the value and usefulness of the whole enterprise. 

In order to adhere to the original time schedule of three volumes a year, it 
became necessary to increase the editorial staff and the number of its assistants 
to an extent not anticipated at the outset. Moreover, the original office space 
proved to be inadequate, and the editorial rooms were transferred to more com- 
modious, but unfortunately more expensive quarters, at 416 West 122nd Street, 
New York. The staff now numbers 50, of whom 27 are editors and editorial 
assistants. To make possible this enlargement of staff and office, the outlay was 





Proceedings 283 


increased beyond the original budget, the total expenditure amounting to 
$168,645.39. 

The budget for the current year, beginning December 1, 1930, adopted by the 
Executive Committee at its annual meeting in December, is $200,000, of which 
$153,000 is for salaries, $20,000 for fees to authors, and the remaining $17,000 for 
office expenses and maintenance. 

The income necessary to meet this budget has been derived from subscriptions 
from individuals, together with grants from two foundations, the Rockefeller and 
the Russell Sage. To these has been added the Carnegie Corporation, which has 
granted $100,000 for the present year. The budget will be defrayed from these 
sources without counting expected royalties. The president, Edwin R. A. Selig- 
man, has reported assurances of income for corresponding expenditures for the 
following years to make possible the completion of the project. 

The Joint Committee of representatives of the codperating associations, includ- 
ing the American Statistical Association, will hold an adjourned meeting in the 
late spring. Meanwhile, suggestions with reference to the conduct of the enter- 


prise are welcomed by the editorial staff. 
Mary VAN KLEECK 


R. H. Coats 
Representatives of the American 
Statistical Association in the 
Joint Committee 


Report of the Representative on the Joint Committee for the Development of 
Statistical Applications in Engineering and Manufacturing 


Representatives on this Committee are: 
L. K. Silleox, American Society of Mechanical Engineers 
W. E. Fulweiler, American Society for Testing Materials 
E. V. Huntington, American Mathematical Society 
W. A. Shewhart, Chairman, American Statistical Association 

This Committee is sponsored by the American Society of Mechanical Engineers 
and the American Society for Testing Materials. Members of this Committee 
have been active in extending the applications of statistical theory in engineering 
fields under the following heads: (1) Calling attention to the fact that modern 
statistical concepts of physical properties and physical laws and of the nature of 
causal indeterminateness make the use of statistical theory necessary in every 
field of engineering; (2) Calling attention to the latest developments in the logic 
of discovery and the mathematical theory of distribution and estimation. 

Members of the Committee have been active in the presentation of papers 
before scientific societies and in the preparation of text material calling attention 
to this new field of application. 

Members of the Committee have taken active part in the formation of a special 
Committee of the American Society for Testing Materials to consider the special 
applications in the field of standardization of interest to that Society. This 
special Committee has already made available a bibliographic guide for engineers 





284 American Statistical Association 


and has under way an interesting program to be given under the auspices of the 
American Society for Testing Materials in connection with their annual meeting 
at Chicago, Illinois, during the month of June, 1931. Dr. Anson Hayes of the 
American Rolling Mills Company is chairman of the Program Committee for 
this meeting and has succeeded in interesting representatives from several large 
commercial organizations. 
Respectfully submitted, 
W. A. SHewuHanrt, Chairman 


Report of the Subcommittee on Aims and Program of the Committee 
on Price Statistics 


It seems wise to confine attention for the present to commodity prices, giving 
chief attention to wholesale spot and contract prices (excluding technical “fu- 
tures’’) and touching incidentally on certain retail price statistics. It is hoped 
subsequently to give fuller attention to the more difficult problem of retail price 
statistics. 

The general objects which the Committee hopes to accomplish are: (1) to se- 
cure accurate, detailed descriptions of the price information now available, and 
of the methods employed in collecting and compiling it; (2) to analyze and formu- 
late criticisms of existing information; (3) to make constructive suggestions as to 
methods and agencies for improving price information and to use the influence of 
the Committee in bringing about such improvements; (4) to make more gener- 
ally available price data collected for special purposes and not now readily ac- 
cessible to general users, e.g., index numbers and price statistics submitted in 
public utility and railroad rate cases. 

Two types of research are essential in furthering these objectives: (a) a general 
survey of the work of the chief price-compiling agencies, and (b) a series of specific 
economic studies, one for the prices and related data of each of our leading com- 
modity groups. 

The description of the price-work of leading compilers would involve a careful 
survey of the data collected on each important commodity by the chief agencies, 
and of methods, specifications, and personnel employed in collecting such data, 
in preparing these data for publication, and in combining them into indexes. 
The agencies to be surveyed should include the Federal Departments of Labor, 
Agriculture, and Commerce, The National Industrial Conference Board, and ten 
to twenty trade associations and trade journals, e.g., The Oil, Paint, and Drug 
Reporter, Fairchild’s, Coal Age, The American Contractor, Petroleum News, Na- 
tional Fertilizer Association, Engineering and Mining Journal, American Metal 
Market, Daily Metal Reporter, Iron Age, American Lumberman. The result would 
be a catalogue of price information that should increase both its availability and 
its proper use. Full use would, of course, be made of existing studies in this field. 

The critique of the work of leading compilers would include: (1) a statement 
of overlappings in present compilations; (2) a survey of the importance of vari- 
ous commodities and markets so as to determine the chief gaps—omitted com- 
modities and groups, incomplete geographical representation, omissions in stages 








Proceedings 285 


where & commodity passes through a series of “vertically” related markets; 
(3) an appraisal of the adequacy of the samples included in general wholesale 
price indexes and of the appropriateness of the weightings and formulae; (4) a 
statement of inadequacies in specifications and other phases of publication of 
price data for individual commodities; and (5) a critique of the methods of treat- 


ing data in their collection and preparation for publication. 
Morris A. CopELAND 


F. C. Mitts 





American Statistical Association 


COMMITTEES FOR 1931 


Members of the Social Science Research Council 
Wesley C. Mitchell, Chairman Term expires December 31, 1932 
Edgar Sydenstricker ™ 26 * 1931 
SE er, . 1933 
Advisory Committee on the Census 
L. D. H. Weld........................Term expires December 31, 1931 
Robert E. Chaddock ” - “ 1932 
ec kaekweweenieaee “ 7 “ 1933 
Committee on Finance 
Edmond E. Lincoln, Chairman Frederick R. Macaulay 
Leonard P. Ayres 


Committee on Institutional Statistics 
Horatio M. Pollock, Chairman Kate H. Claghorn 
David M. Schneider Neva R. Deardorff 
Emil Frankel Frank Bane 
Carl E. McCombs Frederick W. Brown 
Bennett Mead Neil A. Dayton 
C. Luther Fry 


Representative on the Board of Directors of the Encyclopaedia of the Social Sciences 
Wesley C. Mitchell 
Joint Committee on Standards for Graphics 
Karl G. Karsten, Chairman Irving Fisher 
Frederick E. Croxton Arthur H. Richardson 
Committee on Governmental Labor Statistics 
Mary Van Kleeck, Chairman W. A. Berridge 
Bryce M. Stewart, Executive Secretary Louis Bloch 
Eugene B. Patton, Secretary R. D. Cahn 
Charles E. Baldwin R. H. Coats 
Joseph A. Becker J. Frederick Dewhurst 
Paul H. Douglas O. A. Fried 
Meredith B. Givens Leonard W. Hatch 
Ralph G. Hurlin Don D. Lescohier 
Leifur Magnusson Howard B. Myers 
Roswell F. Phelps Casimir A. Sienkiewicz 
Fred G. Tryon H. H. Ward 
Representative on the Advisory Board of the American Year Book 
Edwin W. Kopf 
Joint Committee on the Encyclopaedia of the Social Sciences 
Mary Van Kleeck, Chairman 
Robert H. Coats 
Committee on Nominations 
Malcolm C. Rorty, Chairman Edmund E. Day 
Robert E. Chaddock 





Proceedings 


Committee on Price Statistics 
Carl Snyder, Chairman Robert W. Burgess 
Morris A. Copeland, Secretary Irving Fisher 
Frederick C. Mills Warren M. Persons 
O. C. Stine Williford I. King 
Holbrook Working Woodlief Thomas 


Representative on the Joint Committee for the Development of Statistical Applica- 
tions in Engineering and Manufacturing 
Walter A. Shewhart 
Representative on the Business Research Council 
William A. Berridge 
Committee on Monographs 
Malcolm C. Rorty, Chairman Harry C. Carver 
Committee on the Place of the Next Annual Meeting 
William F. Ogburn, Chairman Edwin B. Wilson 
Willford I. King 


Committee on Fellows 
Horace Secrist, Chairman Term expires December 31, 1931 
Joseph E. Pogue 2 * 1932 
Edgar Sydenstricker “ 1933 
8. L. Andrew “ 1934 
Irving Fisher “ 1935 


Committee on Social Statistics 
Neva R. Deardorff, Chairman Emma A. Winslow 
Emil Frankel Ralph G. Hurlin 
Maurice J. Karpf L. C. Marshall 
Philip Klein Meredith B. Givens 
F. Stuart Chapin Stuart A. Rice 


Committee to Coéperate with the Social Science Research Council 
Frederick C. Mills, Chairman Ralph G. Hurlin 
Carl Parry W. A. Berridge 
Harry C. Carver Edwin B. Wilson 





